ETC V4ECFUM

Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
ColdFire® CF4e Core
User's Manual
V4ECFUM/D
Rev. 0, 06/2001
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
ColdFire is a registered trademark and DigitalDNA is a trademark of Motorola, Inc.
I2C is a registered trademark of Philips Semiconductors
Motorola reserves the right to make changes without further notice to any products herein. Motorola makes no warranty,
representation or guarantee regarding the suitability of its products for any particular purpose, nor does Motorola assume any liability
arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation
consequential or incidental damages. “Typical” parameters which may be provided in Motorola data sheets and/or specifications can
and do vary in different applications and actual performance may vary over time. All operating parameters, including “Typicals” must
be validated for each customer application by customer’s technical experts. Motorola does not convey any license under its patent
rights nor the rights of others. Motorola products are not designed, intended, or authorized for use as components in systems
intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in
which the failure of the Motorola product could create a situation where personal injury or death may occur. Should Buyer purchase or
use Motorola products for any such unintended or unauthorized application, Buyer shall indemnify and hold Motorola and its officers,
employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable
attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or
unauthorized use, even if such claim alleges that Motorola was negligent regarding the design or manufacture of the part. Motorola
and
are registered trademarks of Motorola, Inc. Motorola, Inc. is an Equal Opportunity/Affirmative Action Employer.
How to reach us:
USA/EUROPE/Locations Not Listed: Motorola Literature Distribution; P.O. Box 5405, Denver, Colorado 80217. 1–303–675–2140
or 1–800–441–2447
JAPAN: Motorola Japan Ltd.; SPS, Technical Information Center, 3–20–1, Minami–Azabu. Minato–ku, Tokyo 106–8573 Japan.
81–3–3440–3569
ASIA/PACIFIC: Motorola Semiconductors H.K. Ltd.; Silicon Harbour Centre, 2 Dai King Street, Tai Po Industrial Estate, Tai Po, N.T.,
Hong Kong. 852–26668334
Technical Information Center: 1–800–521–6274
HOME PAGE: http://www.motorola.com/semiconductors
Document Comments: FAX (512) 895-2638, Attn: RISC Applications Engineering
World Wide Web Addresses: http://www.motorola.com/PowerPC
http://www.motorola.com/NetComm
http://www.motorola.com/ColdFire
© Motorola Inc., 2001. All rights reserved.
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
Introduction
1
Registers
2
Instructions
3
FPU
4
EMAC
5
Execution Timing
6
Exceptions
7
Local Memory
8
MMU
10
Debug Module
11
Index
For More Information On This Product,
Go to: www.freescale.com
IND
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
1
Introduction
2
Registers
3
Instructions
4
FPU
5
EMAC
6
Execution Timing
7
Exceptions
8
Local Memory
10
MMU
11
Debug Module
IND
Index
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
CONTENTS
Freescale Semiconductor, Inc...
Paragraph
Number
Title
Page
Number
Chapter 1
Introduction
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.7.1
1.7.1.1
1.7.1.2
1.7.2
1.7.2.1
1.7.2.1.1
1.7.2.1.2
1.8
1.9
1.9.1
Core Overview .................................................................................................... 1-1
Features ............................................................................................................... 1-1
CF4e Implementation Block Diagram ................................................................ 1-2
Architectural Summary....................................................................................... 1-3
Programming Model ........................................................................................... 1-5
Address Map ....................................................................................................... 1-7
Data Format Summary........................................................................................ 1-9
Data Organization in Registers ....................................................................... 1-9
Integer Data Format Organization in Registers .......................................... 1-9
Integer Data Format Organization in Memory ......................................... 1-10
EMAC Data Representation ......................................................................... 1-11
Floating-Point Data Formats and Types ................................................... 1-11
Signed-Integer Data Formats................................................................ 1-12
Floating-Point Data Formats ................................................................ 1-12
Addressing Modes ............................................................................................ 1-12
Instruction Set Overview .................................................................................. 1-13
Instruction Set Summary .............................................................................. 1-16
Chapter 2
Registers
2.1
2.2
2.2.1
2.2.2
2.2.3
2.2.4
2.2.5
2.2.6
2.2.7
2.3
2.3.1
Overview.............................................................................................................
User Programming Model...................................................................................
Data Registers (D7–D0)..................................................................................
Address Registers (A6–A0) ............................................................................
User Stack Pointer (A7)..................................................................................
Program Counter (PC) ....................................................................................
Condition Code Register (CCR) .....................................................................
EMAC Programming Model ..........................................................................
Floating-Point Programming Model...............................................................
Supervisor Programming Model.........................................................................
Status Register (SR)........................................................................................
Contents
For More Information On This Product,
Go to: www.freescale.com
2-1
2-3
2-4
2-4
2-5
2-5
2-5
2-6
2-6
2-7
2-8
v
Freescale Semiconductor, Inc.
CONTENTS
Paragraph
Number
Freescale Semiconductor, Inc...
2.3.2
2.3.3
2.3.4
2.3.5
2.3.6
2.3.7
2.3.8
2.4
Title
Page
Number
Vector Base Register (VBR)........................................................................... 2-9
Supervisor/User Stack Pointers (A7 and OTHER_A7).................................. 2-9
Cache Control Register (CACR) .................................................................. 2-10
Access Control Registers (ACR0–ACR3).................................................... 2-10
RAM Base Address Registers (RAMBAR0/RAMBAR1) ........................... 2-10
ROM Base Address Registers (ROMBAR0/ROMBAR1) ........................... 2-10
Module Base Address Register (MBAR) ..................................................... 2-10
Programming Model Table ............................................................................... 2-12
Chapter 3
Instructions
Chapter 4
Floating-Point Unit (FPU)
4.1
4.1.1
4.2
4.2.1
4.2.2
4.2.3
4.2.3.1
4.2.3.2
4.2.3.3
4.2.3.4
4.2.3.5
4.3
4.3.1
4.3.1.1
4.3.2
4.3.3
4.3.4
4.3.4.1
4.3.4.2
4.3.5
4.3.5.1
4.3.5.2
4.3.6
4.3.7
4.3.8
4.3.9
4.3.10
vi
FPU Overview .................................................................................................... 4-1
Notational Conventions .................................................................................. 4-2
Operand Data Formats and Types....................................................................... 4-3
Signed-integer Data Formats .......................................................................... 4-3
Floating-Point Data Formats........................................................................... 4-3
Floating-Point Data Types .............................................................................. 4-4
Normalized Numbers.................................................................................. 4-4
Zeros ........................................................................................................... 4-4
Infinities...................................................................................................... 4-5
Not-A-Number............................................................................................ 4-5
Denormalized Numbers .............................................................................. 4-5
FPU Programmer’s Model.................................................................................. 4-7
Floating-Point Data Registers (FP0–FP7) ...................................................... 4-8
Floating-Point Control Register (FPCR) .................................................... 4-8
Floating-Point Status Register (FPSR) ........................................................... 4-9
Floating-Point Instruction Address Register (FPIAR).................................. 4-11
Floating-Point Computational Accuracy ...................................................... 4-11
Intermediate Result ................................................................................... 4-11
Rounding the Result ................................................................................. 4-12
Floating-Point Post Processing ..................................................................... 4-15
Underflow, Round, Overflow ................................................................... 4-16
Conditional Testing .................................................................................. 4-16
Floating-Point Exceptions............................................................................. 4-19
Floating-Point Arithmetic Exceptions .......................................................... 4-20
Branch/Set on Unordered (BSUN) ............................................................... 4-21
Input Not-A-Number (INAN)....................................................................... 4-22
Input Denormalized Number (IDE).............................................................. 4-22
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
CONTENTS
Freescale Semiconductor, Inc...
Paragraph
Number
4.3.11
4.3.12
4.3.13
4.3.14
4.3.15
4.3.16
4.4
4.4.1
4.4.2
4.4.3
Title
Page
Number
Operand Error (OPERR)............................................................................... 4-23
Overflow (OVFL) ......................................................................................... 4-23
Underflow (UNFL) ....................................................................................... 4-24
Divide-by-Zero (DZ) .................................................................................... 4-25
Inexact Result (INEX) .................................................................................. 4-25
Floating-Point State Frames.......................................................................... 4-26
Instructions........................................................................................................ 4-28
Floating-Point Instruction Overview ............................................................ 4-28
Floating-Point Instruction Execution Times................................................. 4-30
Key Differences between ColdFire and MC680x0 FPU Programming Models ..
4-31
Chapter 5
Enhanced Multiply-Accumulate Unit (EMAC)
5.1
5.2
5.3
5.4
5.4.1
5.4.1.1
5.4.1.1.1
5.4.1.1.2
5.4.1.1.3
5.4.1.1.4
5.4.2
5.5
5.5.1
5.5.2
Multiply-Accumulate Unit.................................................................................. 5-1
An Introduction to the MAC............................................................................... 5-2
General Operation............................................................................................... 5-3
Memory Map/Register Set.................................................................................. 5-6
MAC Status Register (MACSR)..................................................................... 5-6
Fractional Operation Mode......................................................................... 5-9
Rounding ................................................................................................ 5-9
Saving and Restoring the EMAC Programming Model ....................... 5-10
MULS/MULU ...................................................................................... 5-11
Scale Factor in MAC or MSAC instructions........................................ 5-11
Mask Register (MASK) ................................................................................ 5-11
EMAC Instruction Set Summary ...................................................................... 5-12
Data Representation...................................................................................... 5-13
MAC Opcodes .............................................................................................. 5-13
Chapter 6
Instruction Pipeline and Timing
6.1
6.2
6.3
6.3.1
6.3.2
6.3.3
6.3.4
6.3.5
6.3.6
Basic V4 Pipeline Strategy ................................................................................. 6-1
Instruction Fetch Pipeline (IFP).......................................................................... 6-4
Operand Execution Pipeline (OEP) .................................................................... 6-6
V4 OEP Conceptual Pipeline Model .............................................................. 6-6
Instruction Folding and the Limited Superscalar OEP ................................... 6-9
Sequence-Related OEP Stalls ....................................................................... 6-11
EMAC-Specific OEP Sequence Stalls.......................................................... 6-13
FPU-Specific OEP Sequence Stalls.............................................................. 6-14
Operand Memory Sequence-Related Stalls .................................................. 6-16
Contents
For More Information On This Product,
Go to: www.freescale.com
vii
Freescale Semiconductor, Inc.
CONTENTS
Freescale Semiconductor, Inc...
Paragraph
Number
6.3.7
6.4
6.5
6.5.1
6.5.2
6.5.3
6.5.4
6.5.5
6.5.6
6.5.7
Title
Page
Number
V4 OEP Summary ........................................................................................
Instruction Execution Locations .......................................................................
Instruction Execution Times .............................................................................
MOVE Instruction Execution Times ............................................................
Execution Timings—One-Operand Instructions ..........................................
Execution Timings—Two-Operand Instructions..........................................
Miscellaneous Instruction Execution Times.................................................
Branch Instruction Execution Times ............................................................
EMAC Instruction Execution Times ............................................................
FPU Instruction Execution Times.................................................................
6-16
6-18
6-21
6-23
6-24
6-25
6-27
6-28
6-28
6-30
Chapter 7
Exception Processing
7.1
7.2
7.3
7.4
7.5
Overview.............................................................................................................
Supervisor/User Stack Pointers
(A7 and OTHER_A7) 7-3
Exception Stack Frame Definition......................................................................
Processor Exceptions ..........................................................................................
Precise Faults ......................................................................................................
7-1
7-4
7-5
7-8
Chapter 8
Local Memory
8.1
8.2
8.3
8.4
8.4.1
8.4.1.1
8.4.1.2
8.4.1.3
8.4.1.4
8.5
8.5.1
8.5.2
8.5.2.1
8.5.3
8.5.4
8.5.5
8.6
8.6.1
viii
Local Memory Overview.................................................................................... 8-1
Two-Stage Pipelined Local Bus (K-Bus) ........................................................... 8-5
Interactions between Local Memory Modules ................................................... 8-7
Local Memory Connection Specification ........................................................... 8-8
K-Bus Memory Array Signal Connections..................................................... 8-8
KRAM Information ................................................................................... 8-8
KROM Controller Information................................................................. 8-10
Instruction Cache Information ................................................................. 8-12
Data Cache Information........................................................................... 8-17
SRAM Overview .............................................................................................. 8-22
SRAM Operation .......................................................................................... 8-23
SRAM Programming Model......................................................................... 8-23
SRAM Base Address Registers (RAMBAR0/RAMBAR1)..................... 8-23
SRAM Initialization...................................................................................... 8-25
SRAM Initialization Code ............................................................................ 8-26
Programming RAMBARs for Power Management...................................... 8-27
ROM Overview................................................................................................. 8-28
ROM Operation ............................................................................................ 8-28
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
CONTENTS
Freescale Semiconductor, Inc...
Paragraph
Number
8.6.2
8.6.2.1
8.6.3
8.6.4
8.7
8.7.1
8.7.2
8.7.2.1
8.7.2.2
8.7.3
8.7.4
8.7.4.1
8.7.4.2
8.7.4.3
8.7.5
8.7.6
8.7.6.1
8.7.6.2
8.7.6.3
8.7.6.4
8.7.7
8.7.8
8.7.8.1
8.7.8.2
8.7.8.2.1
8.7.8.2.2
8.7.9
8.7.10
8.7.10.1
8.7.10.2
8.7.11
8.7.12
8.7.13
8.7.13.1
Title
Page
Number
ROM Programming Model...........................................................................
ROM Base Address Registers (ROMBAR0/ROMBAR1) .......................
ROM Initialization........................................................................................
Programming ROMBARs for Power Management......................................
Cache Overview................................................................................................
Optimizing Cache Recommendation ............................................................
Cache Organization.......................................................................................
Cache Line States: Invalid, Valid-Unmodified, and Valid-Modified.......
Cache at Start-Up......................................................................................
Cache Operation ...........................................................................................
Caching Modes .............................................................................................
Cacheable Accesses ..................................................................................
Write-Through Mode (Data Cache Only).................................................
Copyback Mode (Data Cache Only).........................................................
Cache-Inhibited Accesses .............................................................................
Cache Protocol..............................................................................................
Read Miss .................................................................................................
Write Miss (Data Cache Only) .................................................................
Read Hit ....................................................................................................
Write Hit (Data Cache Only) ....................................................................
Cache Coherency (Data Cache Only)...........................................................
Memory Accesses for Cache Maintenance...................................................
Cache Filling.............................................................................................
Cache Pushes ............................................................................................
Push and Store Buffers .........................................................................
Push and Store Buffer Bus Operation...................................................
Cache Locking ..............................................................................................
Cache Registers.............................................................................................
Cache Control Register (CACR) ..............................................................
Access Control Registers (ACR0–ACR3)................................................
Cache Management.......................................................................................
Cache Operation Summary...........................................................................
Instruction Cache State Transitions ..............................................................
Data Cache State Transitions....................................................................
8-29
8-29
8-31
8-32
8-32
8-33
8-33
8-34
8-34
8-35
8-38
8-39
8-39
8-39
8-39
8-40
8-41
8-41
8-41
8-42
8-42
8-42
8-42
8-42
8-43
8-43
8-44
8-45
8-46
8-48
8-50
8-52
8-52
8-53
Chapter 9
Core Interface
9.1
9.2
9.3
9.3.1
Core Interface Signals.........................................................................................
CF4e Pin Characteristics.....................................................................................
ColdFire Master Bus ...........................................................................................
M-Bus Signals.................................................................................................
Contents
For More Information On This Product,
Go to: www.freescale.com
9-1
9-2
9-6
9-6
ix
Freescale Semiconductor, Inc.
CONTENTS
Paragraph
Number
Freescale Semiconductor, Inc...
9.3.2
9.3.2.1
9.3.2.2
9.3.2.3
9.3.2.4
9.3.2.5
9.3.2.6
9.3.2.7
9.3.2.8
Title
Page
Number
M-Bus Operation ............................................................................................ 9-8
Basic Bus Cycles ........................................................................................ 9-8
Pipelined Bus Cycles .................................................................................. 9-9
Address and Data Phase Interactions........................................................ 9-10
Data Size Operations ................................................................................ 9-12
Line Transfers ........................................................................................... 9-13
Bus Arbitration ......................................................................................... 9-16
Interrupt Support....................................................................................... 9-18
Reset Operation ........................................................................................ 9-18
Chapter 10
Memory Management Unit (MMU)
10.1
10.2
10.2.1
10.2.2
10.2.3
10.2.3.1
10.2.3.2
10.2.3.3
10.2.3.4
10.2.3.5
10.2.3.6
10.2.3.7
10.2.3.8
10.2.3.9
10.2.3.10
10.2.3.11
10.3
10.4
10.4.1
10.4.2
10.4.3
10.5
10.5.1
10.5.2
10.5.3
10.5.3.1
10.5.3.2
10.5.3.3
10.5.3.4
x
Features ............................................................................................................. 10-1
Virtual Memory Management Architecture...................................................... 10-1
MMU Architecture Features......................................................................... 10-2
MMU Architectural Location ....................................................................... 10-2
MMU Architecture Implementation ............................................................. 10-3
Precise Faults ............................................................................................ 10-4
MMU Access ............................................................................................ 10-4
Virtual Mode............................................................................................. 10-4
Virtual Memory References ..................................................................... 10-4
Instruction and Data Cache Addresses ..................................................... 10-4
Supervisor/User Stack Pointers ................................................................ 10-5
Access Error Stack Frame ........................................................................ 10-5
Expanded Control Register Space ............................................................ 10-6
Changes to ACRs and CACR ................................................................... 10-6
ACR Address Improvements .................................................................... 10-6
Supervisor Protection................................................................................ 10-7
Debugging in a Virtual Environment................................................................ 10-7
Virtual Memory Architecture Processor Support ............................................. 10-7
Precise Faults ................................................................................................ 10-8
Supervisor/User Stack Pointers .................................................................... 10-8
Access Error Stack Frame Additions ............................................................ 10-8
MMU Definition ............................................................................................... 10-9
Effective Address Attribute Determination .................................................. 10-9
MMU Functionality .................................................................................... 10-10
MMU Organization..................................................................................... 10-11
MMU Base Address Register (MMUBAR) ........................................... 10-11
MMU Memory Map ............................................................................... 10-11
MMU Control Register (MMUCR)....................................................... 10-12
MMU Operation Register (MMUOR).................................................... 10-13
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
CONTENTS
Paragraph
Number
Freescale Semiconductor, Inc...
10.5.3.5
10.5.3.6
10.5.3.7
10.5.4
10.5.5
10.6
10.6.1
10.6.2
10.6.3
10.7
Title
Page
Number
MMU Status Register (MMUSR)...........................................................
MMU Fault, Test, or TLB Address Register (MMUAR).......................
MMU Read/Write Tag and Data Entry Registers
(MMUTR and MMUDR) 10-15
MMU TLB..................................................................................................
MMU Operation .........................................................................................
MMU Implementation ....................................................................................
TLB Address Fields ....................................................................................
TLB Replacement Algorithm .....................................................................
TLB Locked Entries....................................................................................
MMU Instructions...........................................................................................
10-14
10-15
10-17
10-18
10-19
10-20
10-20
10-21
10-22
Chapter 11
Debug Support
11.1
11.2
11.2.1
11.3
11.3.1
11.3.2
11.3.3
11.4
11.4.1
11.4.2
11.4.3
11.4.4
11.4.5
11.4.6
11.4.7
11.4.8
11.4.9
11.4.9.1
11.4.10
11.4.11
11.5
11.5.1
11.5.2
11.5.2.1
11.5.2.2
11.5.3
Overview........................................................................................................... 11-1
Signal Descriptions ........................................................................................... 11-3
Processor Status/Debug Data (PSTDDATA[7:0]) ....................................... 11-4
Real-Time Trace Support.................................................................................. 11-5
Begin Execution of Taken Branch (PST = 0x5) ........................................... 11-8
Processor Stopped or Breakpoint State Change (PST = 0xE) ................... 11-9
Processor Halted (PST = 0xF) ...................................................................... 11-9
Programming Model ....................................................................................... 11-10
Revision A Shared Debug Resources ......................................................... 11-13
Address Attribute Trigger Registers (AATR, AATR1).............................. 11-13
Address Breakpoint Registers (ABLR/ABLR1, ABHR/ABHR1) .......... 11-15
BDM Address Attribute Register (BAAR)................................................. 11-16
Configuration/Status Register (CSR).......................................................... 11-17
Data Breakpoint/Mask Registers (DBR/DBR1, DBMR/DBMR1) ......... 11-19
Program Counter Breakpoint/Mask Registers
(PBR, PBR1, PBR2, PBR3, PBMR) 11-20
Trigger Definition Register (TDR) ............................................................. 11-21
Extended Trigger Definition Register (XTDR) .......................................... 11-23
Resulting Set of Possible Trigger Combinations.................................... 11-25
PC Breakpoint ASID Control Register (PBAC)......................................... 11-26
PC Breakpoint ASID Register (PBASID) .................................................. 11-26
Background Debug Mode (BDM) .................................................................. 11-27
CPU Halt..................................................................................................... 11-28
BDM Serial Interface.................................................................................. 11-29
Receive Packet Format ........................................................................... 11-30
Transmit Packet Format.......................................................................... 11-31
BDM Command Set.................................................................................... 11-32
Contents
For More Information On This Product,
Go to: www.freescale.com
xi
Freescale Semiconductor, Inc.
CONTENTS
Freescale Semiconductor, Inc...
Paragraph
Number
11.5.3.1
11.5.3.1.1
11.5.3.2
11.5.3.3
11.5.3.3.1
11.5.3.3.2
11.5.3.3.3
11.5.3.3.4
11.5.3.3.5
11.5.3.3.6
11.5.3.3.7
11.5.3.3.8
11.5.3.3.9
11.5.3.3.10
11.5.3.3.11
11.5.3.3.12
11.5.3.3.13
11.5.3.3.14
11.6
11.6.1
11.6.1.1
11.6.2
11.7
11.7.1
11.7.2
11.8
11.8.1
11.8.2
11.8.3
11.8.3.1
11.9
Title
Page
Number
ColdFire BDM Command Format..........................................................
Extension Words as Required.............................................................
Command Sequence Diagrams...............................................................
Command Set Descriptions ....................................................................
Read A/D Register (rareg/rdreg) ........................................................
Write A/D Register (wareg/wdreg) ....................................................
Read Memory Location (read)............................................................
Write Memory Location (write) .........................................................
Dump Memory Block (dump) ............................................................
Fill Memory Block (fill) .....................................................................
Resume Execution (go) ......................................................................
No Operation (nop).............................................................................
Synchronize PC to the PSTDDATA Lines (sync_pc)........................
Force Transfer Acknowledge (force_ta).............................................
Read Control Register (rcreg).............................................................
Write Control Register (wcreg) ..........................................................
Read Debug Module Register (rdmreg) .............................................
Write Debug Module Register (wdmreg) ...........................................
Real-Time Debug Support ..............................................................................
Theory of Operation....................................................................................
Emulator Mode .......................................................................................
Concurrent BDM and Processor Operation ................................................
Debug C Definition of PSTDDATA Outputs ................................................
User Instruction Set ....................................................................................
Supervisor Instruction Set...........................................................................
ColdFire Debug History..................................................................................
ColdFire Debug Classic: The Original Definition......................................
ColdFire Debug Revision B........................................................................
ColdFire Debug Revision C........................................................................
Debug Interrupts and Interrupt Requests (Emulator Mode) ...................
Motorola-Recommended BDM Pinout...........................................................
11-33
11-33
11-34
11-35
11-36
11-37
11-38
11-39
11-41
11-43
11-45
11-46
11-47
11-47
11-49
11-52
11-53
11-54
11-55
11-55
11-57
11-58
11-58
11-59
11-64
11-65
11-65
11-66
11-67
11-67
11-68
Chapter 12
Test
12.1
12.1.1
12.1.2
12.1.3
12.2
12.2.1
12.2.2
xii
Scan Chains.......................................................................................................
Core Scan Chains..........................................................................................
Wrapper Scan Chains....................................................................................
Scan Chains Block Diagram .........................................................................
Test Wrapper.....................................................................................................
Features.........................................................................................................
Wrapper Cells ...............................................................................................
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
12-2
12-2
12-2
12-3
12-3
12-4
12-5
Freescale Semiconductor, Inc.
CONTENTS
Freescale Semiconductor, Inc...
Paragraph
Number
12.2.3
12.2.4
12.2.4.1
12.2.4.2
12.2.4.3
12.2.4.4
12.3
12.3.1
12.3.2
12.3.3
12.3.4
12.3.5
12.3.5.1
12.3.6
12.3.6.1
12.3.6.2
12.3.7
12.3.8
12.3.9
12.3.9.1
12.3.10
12.4
12.5
12.5.1
Title
Page
Number
Block Diagram.............................................................................................. 12-6
Timing........................................................................................................... 12-8
CF4eTW Testing of CF4e Core Inputs..................................................... 12-8
CF4eTW Testing of CF4e Core Outputs ................................................ 12-11
CF4eTW Testing of Noncore Inputs ...................................................... 12-13
CF4eTW Testing of Noncore Outputs.................................................... 12-15
BIST................................................................................................................ 12-17
BIST Memory Controllers .......................................................................... 12-18
BIST Core Ports.......................................................................................... 12-19
Power Analysis ........................................................................................... 12-20
Staging of Memories................................................................................... 12-20
Testing Algorithms ..................................................................................... 12-21
March C+ Algorithm .............................................................................. 12-21
ROM BIST Algorithm ................................................................................ 12-22
Modify BIST ROM Signature Script—Part 1 ........................................ 12-23
Modify BIST ROM Signature Script—Part 2 ........................................ 12-24
BIST Test Modes ........................................................................................ 12-25
Memory Data Retention.............................................................................. 12-26
Timing......................................................................................................... 12-26
Memory Clock Determination ................................................................ 12-27
Timing Diagrams ........................................................................................ 12-28
Integration Connections .................................................................................. 12-32
Test Controller ................................................................................................ 12-32
MTMOD[2:0] Encodings ........................................................................... 12-32
Appendix A
Core Interface Timing Characteristics
Contents
For More Information On This Product,
Go to: www.freescale.com
xiii
Freescale Semiconductor, Inc.
CONTENTS
Title
Freescale Semiconductor, Inc...
Paragraph
Number
xiv
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Page
Number
Freescale Semiconductor, Inc.
ILLUSTRATIONS
Freescale Semiconductor, Inc...
Figure
Number
1-1
1-2
1-3
1-4
1-5
1-7
1-6
1-8
1-9
2-1
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
4-1
4-2
4-3
4-4
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
4-13
5-1
5-2
5-3
5-4
5-5
Title
Page
Number
CF4e Core Block Diagram............................................................................................ 1-3
V4 Core Block Diagram ............................................................................................... 1-4
ColdFire Programming Model...................................................................................... 1-6
Organization of Integer Data Format in Data Registers ............................................. 1-10
Organization of Integer Data Formats in Address Registers ...................................... 1-10
Two’s Complement, Signed Fractional Equation....................................................... 1-11
Memory Operand Addressing..................................................................................... 1-11
Floating-Point Data Formats....................................................................................... 1-12
Mantissa ...................................................................................................................... 1-12
Programming Model ..................................................................................................... 2-2
User Programming Model............................................................................................. 2-4
Condition Code Register (CCR) ................................................................................... 2-5
EMAC Register Set....................................................................................................... 2-6
Floating-Point Programmer’s Model ............................................................................ 2-7
Supervisor Programming Model................................................................................... 2-8
Status Register (SR)...................................................................................................... 2-8
Vector Base Register (VBR)......................................................................................... 2-9
Module Base Address Register (MBAR) ................................................................... 2-11
Floating-Point Data Formats......................................................................................... 4-3
Mantissa ........................................................................................................................ 4-4
Normalized Number Format ......................................................................................... 4-4
Zero Format .................................................................................................................. 4-4
Infinity Format .............................................................................................................. 4-5
Not-a-Number Format .................................................................................................. 4-5
Denormalized Number Format ..................................................................................... 4-5
Floating-Point Programmer’s Model ............................................................................ 4-7
Floating-Point Control Register (FPCR) ...................................................................... 4-8
Floating-Point Status Register (FPSR) ......................................................................... 4-9
Intermediate Result Format......................................................................................... 4-12
Rounding Algorithm Flowchart.................................................................................. 4-14
Floating-Point State Frame Contents .......................................................................... 4-26
Multiply-Accumulate Functionality Diagram............................................................... 5-2
Infinite Impulse Response (IIR) Filter.......................................................................... 5-3
Four-Tap FIR Filter....................................................................................................... 5-3
Fractional Alignment .................................................................................................... 5-4
Signed and Unsigned Integer Alignment...................................................................... 5-5
Illustrations
For More Information On This Product,
Go to: www.freescale.com
xv
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
ILLUSTRATIONS
Figure
Page
Title
Number
Number
5-6
EMAC Register Set....................................................................................................... 5-6
5-7
MAC Status Register (MACSR)................................................................................... 5-7
5-8
Two’s Complement, Signed Fractional Equation....................................................... 5-13
6-1
CF4e ColdFire Processor Complex Block Diagram..................................................... 6-3
6-2
OAGComputeEngine Register Renaming Resources................................................... 6-8
6-3
Sequence-Related OEP Sequence Stall ...................................................................... 6-11
6-4
EMAC-Specific OEP Sequence Stall ......................................................................... 6-14
6-5
for_loop Example........................................................................................................ 6-16
6-6
CF4e Ex Execute Engines within the OEP ................................................................. 6-17
7-1
Exception Stack Frame ................................................................................................. 7-4
8-1
Generic CF4e Block Diagram....................................................................................... 8-2
8-2
Local Memory Block Diagram Showing Cache, KRAM, and KROM Controllers ..... 8-3
8-3
ColdFire Core Synchronous Memory Interface............................................................ 8-4
8-4
Synchronous Memory Timing Diagram ....................................................................... 8-4
8-5
Synchronous Memory Interface Block Diagram .......................................................... 8-5
8-6
Version 4 Cache Block Diagram .................................................................................. 8-7
8-7
Cache Organization and Line Format (32-Kbyte Cache Size Shown) ....................... 8-14
8-8
Cache Organization and Line Format (32 Kbyte Cache Size shown) ........................ 8-19
8-9
SRAM Base Address Registers (RAMBARn) ........................................................... 8-24
8-10
ROM Base Address Registers (ROMBAR0/ROMBAR1) ......................................... 8-29
8-11
Data Cache Organization and Line Format ................................................................ 8-34
8-12
Data Cache—A: at Reset, B: after Invalidation, C and D: Loading Pattern............... 8-35
8-13
Data Caching Operation.............................................................................................. 8-36
8-14
Write-Miss in Copyback Mode................................................................................... 8-41
8-15
Data Cache Locking.................................................................................................... 8-45
8-16
Cache Control Register (CACR) ................................................................................ 8-46
8-17
Access Control Register Format (ACRn) ................................................................... 8-49
8-18
An Format (Data Cache)............................................................................................. 8-50
8-19
An Format (Instruction Cache) ................................................................................... 8-50
8-20
Instruction Cache Line State Diagram........................................................................ 8-52
8-21
Data Cache Line State Diagram—Copyback Mode ................................................... 8-54
8-22
Data Cache Line State Diagram—Write-Through Mode ........................................... 8-54
9-1
Generic CF4e Block Diagram....................................................................................... 9-1
9-2
Basic Read and Write Cycles........................................................................................ 9-9
9-3
Pipelined Read and Write ........................................................................................... 9-10
9-4
Address Hold Followed By 1- and 0-Wait State Cycles............................................. 9-11
9-5
mapb and mahb Generated Mid-Data Phase............................................................... 9-12
9-6
mahb Generation for 1X Clock Mode ........................................................................ 9-12
9-7
LIne Access Read with Zero Wait States ................................................................... 9-14
9-8
Line Access Read with One Wait State ...................................................................... 9-15
9-9
Line Access Write with Zero Wait States................................................................... 9-15
9-10
Line Access Write with One Wait State ..................................................................... 9-16
9-11
Multiplexed M-Bus Structure ..................................................................................... 9-16
xvi
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ILLUSTRATIONS
Freescale Semiconductor, Inc...
Figure
Number
9-12
10-1
10-2
10-3
10-4
10-5
10-6
10-7
10-8
10-9
10-10
10-11
11-1
11-2
11-3
11-4
11-5
11-6
11-7
11-8
11-9
11-10
11-11
11-12
11-13
11-14
11-15
11-16
11-17
11-18
11-19
11-20
11-21
11-23
11-22
11-25
11-24
11-27
11-26
11-28
11-29
11-30
11-31
Title
Page
Number
Multiplexed M-Bus Operation.................................................................................... 9-17
CF4e Processor Core Block with MMU..................................................................... 10-3
Exception Stack Frame ............................................................................................... 10-8
MMU Base Address Register ................................................................................... 10-11
MMU Control Register (MMUCR) .......................................................................... 10-12
MMU Operation Register (MMUOR) ...................................................................... 10-13
MMU Status Register (MMUSR)............................................................................. 10-14
MMU Fault, Test, or TLB Register (MMUAR) ....................................................... 10-15
MMU Read/Write TLB Tag Register (MMUTR) .................................................... 10-16
MMU Read/Write TLB Data Register...................................................................... 10-16
K-Bus Address and Attributes Generation ............................................................... 10-19
Version 4 ColdFire MMU Harvard TLB .................................................................. 10-22
Processor/Debug Module Interface............................................................................. 11-1
PSTCLK Timing......................................................................................................... 11-4
PSTDDATA: Single-Cycle Instruction Timing.......................................................... 11-5
Example JMP Instruction Output on PSTDDATA..................................................... 11-8
Debug Programming Model ..................................................................................... 11-11
Address Attribute Trigger Registers (AATR, AATR1)............................................ 11-14
Address Breakpoint Registers (ABLR, ABHR, ABLR1, ABHR1).......................... 11-15
BDM Address Attribute Register (BAAR)............................................................... 11-16
Configuration/Status Register (CSR)........................................................................ 11-17
Data Breakpoint/Mask Registers (DBR/DBR1 and DBMR/DBMR1)..................... 11-19
Program Counter Breakpoint Registers (PBR, PBR1, PBR2, PBR3) ...................... 11-21
Program Counter Breakpoint Mask Register (PBMR) ............................................. 11-21
Trigger Definition Register (TDR) ........................................................................... 11-22
Extended Trigger Definition Register (XTDR) ........................................................ 11-24
PC Breakpoint ASID Control Register (PBAC)....................................................... 11-26
PC Breakpoint ASID Register (PBASID) ................................................................ 11-27
Maximum BDM Serial Interface Timing ................................................................. 11-30
Receive BDM Packet................................................................................................ 11-30
Transmit BDM Packet .............................................................................................. 11-31
BDM Command Format ........................................................................................... 11-33
Command Sequence Diagram................................................................................... 11-34
rareg/rdreg Command Sequence............................................................................... 11-36
rareg/rdreg Command Format................................................................................... 11-36
wareg/wdreg Command Sequence............................................................................ 11-37
wareg/wdreg Command Format ............................................................................... 11-37
read Command Sequence.......................................................................................... 11-38
read Command/Result Formats................................................................................. 11-38
write Command Format ............................................................................................ 11-39
write Command Sequence ........................................................................................ 11-40
dump Command/Result Formats .............................................................................. 11-41
dump Command Sequence ....................................................................................... 11-42
Illustrations
For More Information On This Product,
Go to: www.freescale.com
xvii
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
ILLUSTRATIONS
Figure
Page
Title
Number
Number
11-32 fill Command Format................................................................................................ 11-43
11-33 fill Command Sequence............................................................................................ 11-44
11-35 go Command Sequence............................................................................................. 11-45
11-34 go Command Format ................................................................................................ 11-45
11-37 nop Command Sequence........................................................................................... 11-46
11-36 nop Command Format .............................................................................................. 11-46
11-39 sync_pc Command Sequence ................................................................................... 11-47
11-38 sync_pc Command Format ....................................................................................... 11-47
11-41 force_TA Command Sequence ................................................................................. 11-48
11-40 force_ta Command.................................................................................................... 11-48
11-43 rcreg Command Sequence ........................................................................................ 11-49
11-42 rcreg Command/Result Formats ............................................................................... 11-49
11-45 wcreg Command Sequence....................................................................................... 11-52
11-44 wcreg Command/Result Formats.............................................................................. 11-52
11-47 rdmreg Command Sequence ..................................................................................... 11-53
11-46 rdmreg bdm Command/Result Formats.................................................................... 11-53
11-49 wdmreg Command Sequence ................................................................................... 11-54
11-48 wdmreg BDM Command Format ............................................................................. 11-54
11-1
Recommended BDM Connector............................................................................... 11-69
12-1
CF4e Scan Chains Block Diagram ............................................................................. 12-3
12-2
CF4e and Test Wrapper in SoC .................................................................................. 12-4
12-3
CF4e Core Shared Wrapper Cells............................................................................... 12-5
12-4
CF4e Core Dedicated Input Wrapper Cell (P Cell) .................................................... 12-6
12-5
Example of Registered CF4eTW Architecture ........................................................... 12-7
12-6
Scans and Flops........................................................................................................... 12-8
12-7
CF4eTW Input to CF4e Core Scan Stuck-At Vector Example .................................. 12-9
12-8
CF4eTW Input to CF4e Core Scan Delay Vector Example ..................................... 12-10
12-9
CF4e Core to CF4eTW Output Scan Stuck-At Vector Example.............................. 12-12
12-10 CF4e Core to CF4eTW Output Scan Delay Vector Example................................... 12-13
12-11 CF4eTW to Non-Core Input Scan Stuck-At Vector Example.................................. 12-14
12-12 CF4eTW to Non-Core Delay Scan Vector Example ................................................ 12-15
12-13 Non-Core to CF4eTW Input Scan Stuck-At Vector Example.................................. 12-16
12-14 Non-Core to CF4eTW Input Scan Delay Vector Example....................................... 12-17
12-15 CF4e BIST Hierarchy ............................................................................................... 12-18
12-16 Flow of Characterization Method ............................................................................. 12-25
12-17 March C+ Algorithm ................................................................................................ 12-26
12-18 512 x 32 RAM BIST Clock Cycles .......................................................................... 12-27
12-19 512 x 32 ROM BIST Clock Cycles .......................................................................... 12-27
12-20 PBIST Initialization .................................................................................................. 12-28
12-21 EBIST Timing Diagram for an 8-Kbyte Cache Tag Array....................................... 12-29
12-22 EBIST Timing Diagram For An 8-Kbyte Cache Data Array ................................... 12-30
12-23 EBIST Timing Diagram For A 2-Kbyte KRAM0 Array.......................................... 12-31
12-24 EBIST Timing Diagram For 2-Kbyte KROM0 Array.............................................. 12-31
xviii
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
TABLES
Freescale Semiconductor, Inc...
Table
Number
1-1
1-2
1-3
1-4
1-5
1-6
2-1
2-2
2-3
2-4
4-1
4-2
4-3
4-4
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
4-13
4-14
4-15
4-16
4-17
4-18
4-19
4-20
4-21
4-22
4-23
4-24
4-25
4-26
Title
Page
Number
ColdFire CPU Space Assignments ............................................................................... 1-7
Integer Data Formats..................................................................................................... 1-9
ColdFire Effective Addressing Modes........................................................................ 1-13
V4 New Instruction Summary .................................................................................... 1-14
User-Mode Instruction Set Summary ......................................................................... 1-16
Supervisor-Mode Instruction Set Summary................................................................ 1-20
CCR Field Descriptions ............................................................................................... 2-5
Status Field Descriptions .............................................................................................. 2-8
MBAR Field Descriptions .......................................................................................... 2-11
ColdFire CPU Registers.............................................................................................. 2-12
Notational Conventions ................................................................................................ 4-2
Floating-Point Addressing Modes ................................................................................ 4-3
Real Format Summary .................................................................................................. 4-6
FPCR Field Descriptions .............................................................................................. 4-8
FPSR Field Descriptions............................................................................................... 4-9
Tie-Case Example....................................................................................................... 4-15
Round Mode Error Bounds......................................................................................... 4-15
FPCC Encodings......................................................................................................... 4-16
Floating-Point Conditional Tests ................................................................................ 4-18
Floating-Point Exception Vectors............................................................................... 4-19
Exception Priorities..................................................................................................... 4-20
BSUN Exception Enabled/Disabled Results .............................................................. 4-22
INAN Exception Enabled/Disabled Results ............................................................... 4-22
IDE Exception Enabled/Disabled Results .................................................................. 4-23
Possible Operand Errors ............................................................................................. 4-23
OPERR Exception Enabled/Disabled Results ............................................................ 4-23
OVFL Exception Enabled/Disabled Results............................................................... 4-24
UNFL Exception Enabled/Disabled Results............................................................... 4-25
DZ Exception Enabled/Disabled Results.................................................................... 4-25
Inexact Rounding Mode Values.................................................................................. 4-25
INEX Exception Enabled/Disabled Results................................................................ 4-26
Format Word Field Descriptions ................................................................................ 4-27
Floating-Point Instruction Formats ............................................................................. 4-28
Instruction Format Terminology................................................................................. 4-29
Floating-Point Instruction Execution Times, , ........................................................... 4-30
Key Programming Model Differences........................................................................ 4-31
Tables
For More Information On This Product,
Go to: www.freescale.com
xix
Freescale Semiconductor, Inc.
TABLES
Freescale Semiconductor, Inc...
Table
Number
4-27
4-28
4-29
5-1
5-2
5-3
6-1
6-2
6-3
6-4
6-5
6-6
6-7
6-8
6-9
6-10
6-11
6-12
6-13
6-14
6-15
6-16
7-1
7-2
7-3
7-4
8-1
8-2
8-3
8-4
8-5
8-6
8-7
8-8
8-9
8-10
8-11
8-12
8-13
8-14
8-15
8-16
8-17
xx
Title
Page
Number
68K/ColdFire Operation Sequence 1 .......................................................................... 4-32
68K/ColdFire Operation Sequence 2 .......................................................................... 4-32
68K/ColdFire Operation Sequence 3 .......................................................................... 4-32
MACSR Field Descriptions .......................................................................................... 5-7
Summary of S/U, F/I, and R/T Control Bits ................................................................. 5-9
EMAC Instruction Summary ...................................................................................... 5-12
CFxCore Processor Execution Latency ........................................................................ 6-2
V4 RTS Execution Times ............................................................................................. 6-5
Instructions that Make Results Available to Subsequent Instructions........................ 6-12
FPU Execution Example............................................................................................. 6-15
V4 ColdFire Compute Engine Location ..................................................................... 6-18
Misaligned Operand References ................................................................................. 6-22
Move Byte and Word Execution Times...................................................................... 6-23
Move Long Execution Times...................................................................................... 6-23
MAC and Miscellaneous Move Execution Times ...................................................... 6-24
One-Operand Instruction Execution Times ................................................................ 6-25
Two-Operand Instruction Execution Times................................................................ 6-25
Miscellaneous Instruction Execution Times............................................................... 6-27
General Branch Instruction Execution Times............................................................. 6-28
Bcc Instruction Execution Times................................................................................ 6-28
EMAC Instruction Execution Times .......................................................................... 6-29
FPU Instruction Execution Times, ............................................................................. 6-30
Exception Vector Assignments..................................................................................... 7-2
Format/Vector Word..................................................................................................... 7-5
Exceptions..................................................................................................................... 7-6
OEP EX Cycle Operations............................................................................................ 7-8
Synchronous Memory Truth Table (Sampled at Positive Edge of CLK)..................... 8-5
KRAM Size................................................................................................................... 8-9
KRAM Memory Array Connections ............................................................................ 8-9
KRAM0/KRAM1 Array Address Connection............................................................ 8-10
KRAM0/KRAM1 Byte Write Enables ....................................................................... 8-10
KRAM0/KRAM1 Size ............................................................................................... 8-11
KROM{0,1} Memory Array Connections.................................................................. 8-11
KROM Array Address Connection............................................................................. 8-12
Instruction Cache Sizes and Configurations ............................................................... 8-13
Instruction Cache Size ................................................................................................ 8-14
Instruction Cache Memory Array Connections .......................................................... 8-14
Instruction Cache Data Array Address Connection.................................................... 8-16
Instruction Cache Tag Array Address Connection ..................................................... 8-16
Instruction Cache Tag Array Write Data Connection................................................. 8-16
Data Cache Sizes and Configurations......................................................................... 8-18
Data Cache Size .......................................................................................................... 8-19
Data Cache Memory Array Connections.................................................................... 8-19
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
TABLES
Freescale Semiconductor, Inc...
Table
Number
8-18
8-19
8-20
8-21
8-22
8-23
8-24
8-25
8-26
8-27
8-28
8-29
8-30
8-31
8-32
8-33
8-34
9-1
9-2
9-3
9-4
9-5
9-6
10-1
10-2
10-3
10-4
10-5
10-6
10-7
10-8
10-9
10-10
10-11
10-12
10-13
11-1
11-2
11-3
11-4
11-5
11-6
11-7
Title
Page
Number
Data Cache Data Array Address Connection.............................................................. 8-21
Data Cache Tag Array Address Connection............................................................... 8-21
Data Cache Tag Array Write Data Connection .......................................................... 8-21
RAMBARn Field Description .................................................................................... 8-24
KRAM Size Configuration ......................................................................................... 8-25
Examples of Typical RAMBAR Settings ................................................................... 8-27
ROMBAR Field Descriptions..................................................................................... 8-30
KROM Size Configuration ......................................................................................... 8-30
Examples of Typical ROMBAR Settings ................................................................... 8-32
Valid and Modified Bit Settings ................................................................................. 8-34
CACR Field Descriptions ........................................................................................... 8-46
ACRn Field Descriptions............................................................................................ 8-49
Instruction Cache Line State Transitions.................................................................... 8-53
Data Cache Line State Transitions.............................................................................. 8-54
Data Cache Line State Transitions (Previous State Invalid)....................................... 8-56
Data Cache Line State Transitions (Previous State Valid) ......................................... 8-56
Data Cache Line State Transitions (Previous State Modified) ................................... 8-57
CF4e Pin Characteristics............................................................................................... 9-2
M-Bus Signals............................................................................................................... 9-6
Processor Operand Representation ............................................................................. 9-12
mrdata Requirements for Read Transfers ................................................................... 9-13
mwdata Bus Requirements for Write Transfers.......................................................... 9-13
Allowable Line Access Patterns ................................................................................. 9-14
New ACR and CACR Bits.......................................................................................... 10-6
Fault Status Encodings................................................................................................ 10-8
MMU Base Address Register Field Descriptions..................................................... 10-11
MMU Memory Map ................................................................................................. 10-12
MMUCR Field Descriptions..................................................................................... 10-13
MMUOR Field Descriptions..................................................................................... 10-13
MMUSR Field Descriptions ..................................................................................... 10-15
MMUAR Field Descriptions..................................................................................... 10-15
MMUTR Field Descriptions ..................................................................................... 10-16
MMUDR Field Descriptions..................................................................................... 10-17
Version 4 K-Bus Memory Pipelines ......................................................................... 10-18
K-Bus Pipeline Cycles .............................................................................................. 10-18
PLRU State Bits........................................................................................................ 10-20
Debug Module Signals................................................................................................ 11-3
PSTDDATA: Sequential Execution of Single-Cycle Instructions ............................. 11-4
PSTDDATA: Data Operand Captured........................................................................ 11-5
Processor Status Encoding.......................................................................................... 11-7
0xE Status Posting ...................................................................................................... 11-9
BDM/Breakpoint Registers....................................................................................... 11-11
Rev. A Shared BDM/Breakpoint Hardware ............................................................. 11-13
Tables
For More Information On This Product,
Go to: www.freescale.com
xxi
Freescale Semiconductor, Inc.
TABLES
Freescale Semiconductor, Inc...
Table
Number
11-8
11-9
11-10
11-11
11-12
11-13
11-14
11-15
11-16
11-17
11-18
11-19
11-20
11-21
11-22
11-23
11-24
11-25
11-26
11-27
11-28
11-29
11-30
11-31
11-32
11-33
12-1
12-2
12-3
12-4
12-5
12-6
A-1
xxii
Title
Page
Number
AATR and AATR1 Field Descriptions..................................................................... 11-14
ABLR and ABLR1 Field Description....................................................................... 11-16
ABHR and ABHR1 Field Description...................................................................... 11-16
BAAR Field Descriptions ......................................................................................... 11-16
CSR Field Descriptions ............................................................................................ 11-17
DBRn Field Descriptions.......................................................................................... 11-20
DBMRn Field Descriptions ...................................................................................... 11-20
Access Size and Operand Data Location .................................................................. 11-20
PBR, PBR1, PBR2, PBR3 Field Descriptions.......................................................... 11-21
PBMR Field Descriptions ......................................................................................... 11-21
TDR Field Descriptions ............................................................................................ 11-22
XTDR Field Descriptions ......................................................................................... 11-24
PBAC Field Descriptions.......................................................................................... 11-26
PBASID Field Descriptions...................................................................................... 11-27
Receive BDM Packet Field Description ................................................................... 11-31
Transmit BDM Packet Field Description ................................................................. 11-31
BDM Command Summary ....................................................................................... 11-32
BDM Field Descriptions ........................................................................................... 11-33
Definition of DRc Encoding—Read......................................................................... 11-53
PSTDDATA Nibble/CSR[BSTAT] Breakpoint Response....................................... 11-55
Exception Vector Assignments................................................................................. 11-56
PSTDDATA Specification for User-Mode Instructions........................................... 11-59
PSTDDATA Values for User-Mode Multiply-Accumulate Instructions ................. 11-62
PSTDDATA Values for User-Mode Floating-Point Instructions............................. 11-63
Data Markers and FPU Operand Format Specifiers ................................................. 11-64
PSTDDATA Specification for Supervisor-Mode Instructions ................................. 11-64
CF4e Core Scan Chains .............................................................................................. 12-2
CF4e Wrapper Scan Chains ........................................................................................ 12-2
BIST Core Pins ......................................................................................................... 12-19
BIST Cycles .............................................................................................................. 12-27
EBIST Tag Output Data............................................................................................ 12-29
CF4e Motorola Test Mode Encodings...................................................................... 12-32
Timing Budget Variables for Various Process and Frequency Targets...................... 13-1
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
About This Book
The primary objective of this user’s manual is to define the functionality of CF4e ColdFire
microprocessor core. The CF4e implementation of the Version 4 (V4) core includes the
floating-point unit (FPU), enhanced multiply-accumulate unit (EMAC), and memory
management unit (MMU) that are defined as optional in the V4 architecture.
The information in this book is subject to change without notice, as described in the
disclaimers on the title page of this book. As with any technical documentation, it is the
readers’ responsibility to be sure they are using the most recent version of the
documentation.
To locate any published errata or updates for this document, refer to the world-wide web at
http://www.motorola.com/coldfire.
Audience
This manual is intended for developers who want to use the CF4e processor core in their
products. It is assumed that the reader understands operating systems, microprocessor
system design, general principles of software and hardware, and basic details of the
ColdFire architecture.
Organization
Following is a summary and a brief description of the major sections of this manual:
•
Chapter 1, “Introduction,” includes general descriptions of the CF4e
implementation of the V4 architecture, focussing in particular on new features.
•
Chapter 2, “Registers,” describes the organization of CFe general-purpose and
control registers in the user and supervisor programming models.
•
Chapter 3, “Instructions,” provides a pointer to the instruction descriptions in the
ColdFire Family Programmer’s Reference Manual (PRM).
•
Chapter 4, “Floating-Point Unit (FPU),” describes instructions implemented in the
floating-point unit (FPU) designed for use with the ColdFire family of
microprocessors. The FPU conforms to the American National Standards Institute
(ANSI)/Institute of Electrical and Electronics Engineers (IEEE) Standard for Binary
Floating-Point Arithmetic (ANSI/IEEE Standard 754).
About This Book
For More Information On This Product,
Go to: www.freescale.com
xxiii
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Organization
•
Chapter 5, “Enhanced Multiply-Accumulate Unit (EMAC),” describes the
functionality, microarchitecture, and performance of the enhanced
multiply-accumulate (EMAC) unit in the ColdFire family of processors.
•
Chapter 6, “Instruction Pipeline and Timing,” describes performance features of the
CF4e ColdFire processor pipeline structure. It is intended as a guide for developing
compilers or optimizing assembly language application code. It describes the basic
CF4e pipeline strategy, contrasting it with Version 2 and 3 designs. It also provides
performance-related details of the instruction fetch and operand execution pipelines
(IFP and OEP).
•
Chapter 7, “Exception Processing,” describes CFe exception processing, focusing
on differences from previous ColdFire versions. In particular, additional encodings
have been added to the fault status (FS) field in the exception stack frame to indicate
exceptions related to translation lookaside buffers (TLBs). This provides CF4e core
designs with precise, recoverable faults for all K-Bus references to support
demand-paged memory accesses.
•
Chapter 8, “Local Memory,” describes the implementation of the V4 local memory
specification, which implements a Harvard memory architecture, including separate
caches, ROM, RAM, and the necessary buses and registers to support instruction
and data memory. This chapter consists of the following major sections:
— Section 8.5, “SRAM Overview,” describes the on-chip static RAM (SRAM)
implementation. It covers general operations, configuration, and initialization. It
also provides information and examples showing how to minimize power
consumption when using the SRAM.
— Section 8.6, “ROM Overview,” describes the on-chip ROM implementation. It
covers general operations, configuration, and initialization. It also provides
information and examples showing how to minimize power consumption when
using the ROM.
— Section 8.7, “Cache Overview,” describes the cache implementation, including
organization, configuration, and coherency. It describes cache operations and
how the caches interface with other memory structures.
•
Chapter 9, “Core Interface,” describes the CF4e core interface and provides an
overview of the functional operation of the master bus (M-Bus).
•
Chapter 10, “Memory Management Unit (MMU),” describes the ColdFire virtual
memory management unit (MMU), which provides virtual-to-physical address
translation and memory access control.
•
Chapter 11, “Debug Support,” describes the Revision D enhanced hardware debug
support in the ColdFire Version 4. This revision of the ColdFire debug architecture
encompasses earlier revisions. An expanded set of debug functionality is defined as
Revision B (or Rev. B). The further enhanced debug architecture implemented in the
Version 4 ColdFire is known as Revision C (or Rev. C).
xxiv
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Suggested Reading
•
Chapter 12, “Test,” provides an overview of test features of CF4e. Some of the
features, such as MBist hardware, are included in the CF4e design. The scan and
wrapper methodology, described later in the chapter, are part of the CF4e design but
are described here as a reference for properly designing CF4e for test.
This manual also includes an index.
Suggested Reading
Freescale Semiconductor, Inc...
This section lists additional reading that provides background for the information in this
manual as well as general information about the ColdFire architecture.
General Information
The following documentation provides useful information about the ColdFire architecture
and computer architecture in general:
ColdFire Documentation
The ColdFire documentation is available from the sources listed on the back cover of this
manual. Document order numbers are included in parentheses for ease in ordering.
•
ColdFire Microprocessor Family Programmer’s Reference Manual, or PRM
(COLDFIREPRM/AD, Rev 2)
•
Using Microprocessors and Microcomputers: The Motorola Family, William C.
Wray, Ross Bannatyne, Joseph D. Greenfield
Additional literature on ColdFire implementations is being released as new processors
become available. For a current list of ColdFire documentation, refer to the World Wide
Web at http://www.motorola.com/ColdFire.
Conventions
This document uses the following notational conventions:
MNEMONICS
In text, instruction mnemonics are shown in uppercase.
mnemonics
In code and tables, instruction mnemonics are shown in lowercase.
COMMANDS
Command names are shown in small caps.
italics
0x0
Italics indicate variable command parameters.
Book titles in text are set in italics.
Prefix to denote hexadecimal number
0b0
Prefix to denote binary number
REG[FIELD]
Abbreviations for registers are shown in uppercase. Specific bits,
fields, or ranges appear in brackets. For example, RAMBAR[BA]
identifies the base address field in the RAM base address register.
About This Book
For More Information On This Product,
Go to: www.freescale.com
xxv
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Acronyms and Abbreviations
nibble
A 4-bit data unit
byte
An 8-bit data unit
word
A 16-bit data unit
longword
A 32-bit data unit
x
In some contexts, such as signal encodings, x indicates a don’t care.
n
Used to express an undefined numerical value
¬
NOT logical operator
&
AND logical operator
|
OR logical operator
Acronyms and Abbreviations
Table i lists acronyms and abbreviations used in this document.
Table i. Acronyms and Abbreviated Terms
Term
ALU
Arithmetic logic unit
BDM
Background debug mode
BIST
Built-in self test
BSDL
Boundary-scan description language
CODEC
Code/decode
DMA
Direct memory access
DSP
Digital signal processing
EA
Effective address
EMAC
Enhanced multiply-accumulate unit
FIFO
First-in, first-out
IEEE
Institute for Electrical and Electronics Engineers
IFP
Instruction fetch pipeline
IPL
Interrupt priority level
JTAG
Joint Test Action Group
LIFO
Last-in, first-out
LRU
Least recently used
LSB
Least-significant byte
lsb
MAC
xxvi
Meaning
Least-significant bit
Multiple accumulate unit
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Acronyms and Abbreviations
Table i. Acronyms and Abbreviated Terms (Continued)
Term
Freescale Semiconductor, Inc...
MBAR
Meaning
Memory base address register
MSB
Most-significant byte
msb
Most-significant bit
Mux
Multiplex
NOP
No operation
OEP
Operand execution pipeline
PC
Program counter
PCLK
Processor clock
PLL
Phase-locked loop
PLRU
Pseudo least recently used
POR
Power-on reset
RISC
Reduced instruction set computing
Rx
Receive
SIM
System integration module
SOF
Start of frame
TAP
Test access port
Tx
Transmit
About This Book
For More Information On This Product,
Go to: www.freescale.com
xxvii
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Acronyms and Abbreviations
xxviii
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 1
Introduction
This section is an overview of the CF4e ColdFire microprocessor core. The CF4e
implementation of the Version 4 (V4) core includes the floating-point unit (FPU), enhanced
multiply-accumulate unit (EMAC), and memory management unit (MMU) that are defined
as optional in the V4 architecture.
1.1 Core Overview
The V4 core includes a Harvard memory architecture, branch cache acceleration logic, and
limited superscalar dual-instruction issue capabilities. The V4 core provides 1.54
Dhrystone 2.1 MIPS per MHz.
1.2 Features
The CF4e includes the following features defined as optional in the V4 core architecture:
•
•
•
Floating-point unit (FPU)
Virtual memory management unit (MMU)
Enhanced multiply-accumulate unit (EMAC) for increased signal processing
functionality plus backward code compatibility with the MAC unit of previous
ColdFire processors
V4 architecture features are defined as follows:
•
•
•
•
•
•
•
Variable-length RISC, clock-multiplied core
Revision B of the ColdFire instruction set architecture (ISA_B) providing new
instructions to improve performance and code density
Two independent, decoupled pipelines—four-stage instruction fetch pipeline (IFP)
and five-stage operand execution pipeline (OEP) for increased performance.
Ten-instruction, FIFO buffer decouples the IFP and OEP
Limited superscalar design approaches dual-issue performance with the cost of a
scalar execution pipeline
Two-level branch acceleration mechanism with a branch cache plus a prediction
table for increased performance of conditional Bcc instructions
32-bit address bus supporting 4 Gbytes of linear address space
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-1
Freescale Semiconductor, Inc.
CF4e Implementation Block Diagram
•
•
•
•
•
•
32-bit data bus
16 user-accessible, 32-bit-wide, general-purpose registers
Supervisor/user modes for system protection
Two separate stack pointer (A7) registers—the supervisor stack pointer (SSP) and
the user stack pointer (USP)—provide the required isolation between operating
modes to support the MMU.
Vector base register to relocate the exception-vector table
Optimized for high-level language constructs
Freescale Semiconductor, Inc...
1.3 CF4e Implementation Block Diagram
Figure 1-1 shows the key elements of the CF4e core implementation. Individual pipeline
stages are defined and described in detail in Chapter 6, “Instruction Pipeline and Timing.”
1-2
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Architectural Summary
IFP
J
IAG
Branch
Cache
KC1
IC1
KC2
IC2
Branch
Accel.
Instruction
Memory
Physical
KC1
IED
Freescale Semiconductor, Inc...
IB
Memory
Management
Unit
(MMU)
OEP
DS
Physical
KC1
DS
J
OAG
Data
Memory
KC1
OC1
KC2
OC2
EX
M-Bus
K2M
Mis
EMAC
FPU
DA
BDM
DSCLK DSI
DSDO
DDATA
PSTDDATA PSTCLK
Figure 1-1. CF4e Core Block Diagram
1.4 Architectural Summary
Figure 1-2 shows the standard CF4e microprocessor configuration. The hierarchical bus
structure provides varying layers of data bandwidth and supports an efficient partitioning
of optional on-chip modules. The bus hierarchy is as follows:
1. Processor-local bus (K-Bus)
2. Master bus (M-Bus)
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-3
Freescale Semiconductor, Inc.
Architectural Summary
3. Slave bus (S-bus)
4. External bus (E-bus)
The CF4e reference design is defined by the V4 core hierarchy. The CoreKmem boundary
includes the core design and processor-local memories required for a given design.
External Bus
Slave Bus
System Bus
Controller
Freescale Semiconductor, Inc...
Master Bus
Slave Module
Slave Module
Debug
V4
CPU
EMAC FPU
(optional in V4)
Master
Module
K2M
DIV
Processor Bus
KRAM
Control
KROM
Control
KRAM
Memory
Array
KROM
Memory
Array
Cache
Control
Cache
Tag
Array
Cache
Instruction/Data
Arrays
CoreKmem
Figure 1-2. V4 Core Block Diagram
The processor connects to a number of memory controllers and a bus controller through a
local, high-speed bus. Processor-local memories include caches, RAM, and ROM. V4
memory controllers support a range of sizes, allowing the ability to specify the optimum
memory organization for a given application. The K2M bus controller controls transfers on
the processor-local bus and initiates and controls all accesses onto the next-level system
bus, the master bus (M-Bus). The processor-local bus is designed to maximize bandwidth
from high-speed memories to support efficient instruction execution.
The M-Bus is the primary interface between the core and other system-level components.
Devices that can initiate bus cycles are typically connected to the M-Bus. Example modules
include direct-memory access devices (DMA) or another ColdFire processor complex. The
M-Bus is typically connected to a system interface module (SIM), which provides two
interfaces: one to a simple, on-chip slave bus (S-bus) and another to an application-specific
external bus (E-bus). The S-bus generally is connected to any number of standard
peripheral modules, such as timers, universal asynchronous receiver/transmitters (UARTs),
other serial communication devices, and parallel ports. Use of a standard Motorola-defined
bus protocol promotes reuse of these synthesizable modules. Specific implementation and
1-4
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model
protocol details of the external bus can vary widely, depending on system requirements.
The V4 design allows a core to operate at any integer multiplier (n = 1, 2, 3,...) faster than
the rest of the design. For multiple clock domains, the boundary is the M-Bus; that is, the
processor complex operates at the higher frequency, while the M-Bus and the rest of the
microprocessor operate at the slower speed. This well-defined, easy-to-use clock boundary
simplifies interface design and timing and eases production test complications.
Freescale Semiconductor, Inc...
The overall ColdFire implementation strategy of 100% synthesizable designs and use of
compiled memory arrays coupled with the modular system architecture allows easy
migration to any process technology and provides cost-effective integration capabilities
while targeting a variety of operating voltages and frequencies.
1.5 Programming Model
Figure 1-3 shows the V4 programming model, which is organized as follows:
•
•
User mode. User-mode software is restricted to user-mode instructions and registers.
Supervisor mode. Supervisor-mode software can reference all user- and
supervisor-mode instructions and registers.
The status register supervisor bit (SR[S]) selects the mode. Note that this figure shows
optional V4 registers implemented in the CF4e.
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-5
Freescale Semiconductor, Inc.
Programming Model
31
0
Freescale Semiconductor, Inc...
63
A0
A1
A2
A3
A4
A5
A6
A7
PC
CCR
Address registers
User stack pointer
Program counter
Condition code register
0
31
FP0
FP1
FP2
FP3
FP4
FP5
FP6
FP7
FPCR
FPSR
FPIAR
Floating-point data registers
MACSR
ACC0
ACC1
ACC2
ACC3
ACCext01
ACCext23
MASK
MAC status register
MAC accumulator 0
MAC accumulator 1 (EMAC only)
MAC accumulator 2 (EMAC only)
MAC accumulator 3 (EMAC only)
ACC0 and ACC1 extensions
ACC2 and ACC3 extensions
MAC mask register
Floating-point control register
Floating-point status register
Floating-point instruction address register
0
15
31
Supervisor Registers
Data registers
0
User Registers
31
D0
D1
D2
D3
D4
D5
D6
D7
0
(CCR) SR
OTHER_A7
Must be zeros VBR
CACR
ASID
ACR0
ACR1
ACR2
ACR3
MMUBAR
ROMBAR0
ROMBAR1
RAMBAR0
RAMBAR1
MBAR
19
Status register
Supervisor A7 stack pointer
Vector base register
Cache control register
Address space ID register
Access control register 0 (data)
Access control register 1 (data)
Access control register 2 (instruction)
Access control register 3 (instruction)
MMU base address register
ROM base address register 0
ROM base address register 1
RAM base address register0
RAM base address register 1
Module base address register
Figure 1-3. ColdFire Programming Model
1-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Address Map
1.6 Address Map
Table 1-1 shows the ColdFire CPU space assignment reserved for program-visible
registers. In general, these registers can be read and written through this space for debug
accesses, initiated by an external emulator through the serial BDM communication
channel. All control and configuration registers can be written to using the privileged move
control register (MOVEC) instruction.
Freescale Semiconductor, Inc...
NOTE:
A core may not implement all registers or register fields defined
by the architecture, and it may implement additional registers
or fields.
Table 1-1 lists register names, the CPU space assignment, whether the register is written
from the processor using the MOVEC instruction, and the complete register name.
Table 1-1. ColdFire CPU Space Assignments
Name
CPU Space
Assignment
Written with MOVEC
Register Name
Memory Management Control Registers
CACR
0x002
Yes
Cache control register
ASID
0x003
Yes
Address space identifier
ACR0–ACR3
0x004–0x007
Yes
Access control registers [0:3]
MMUBAR
0x008
Yes
MMU base address register
Processor General-Purpose Registers
D0–D7
0x(0,1)80–0x(0,1)87
No
Data registers 0–7 (0 = load, 1 = store)
A0–A7
0x(0,1)88–0x(0,1)8F
No
Address registers 0–7 (0 = load, 1 = store)
A7 is user stack pointer
Processor Miscellaneous Registers
OTHER_A7
0x800
No
Other stack pointer
VBR
0x801
No
Vector base register
MACSR
0x804
No
MAC status register
MASK
0x805
No
MAC address mask register
ACC
0x806
No
MAC accumulator
ACC0–ACC3
0x806–0x80B
No
MAC accumulators 0–3
ACCEXT01
0x807
No
MAC accumulator 0, 1 extension bytes
ACCEXT23
0x808
No
MAC accumulator 2, 3 extension bytes
SR
0x80E
No
Status register
PC
0x80F
No
Program counter
Processor Floating-Point Registers
FPU0
0x810
No
32 msbs of floating-point data register 0
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-7
Freescale Semiconductor, Inc.
Address Map
Freescale Semiconductor, Inc...
Table 1-1. ColdFire CPU Space Assignments
Name
CPU Space
Assignment
Written with MOVEC
FPL0
0x811
No
32 lsbs of floating-point data register 0
FPU1
0x812
No
32 msbs of floating-point data register 1
FPL1
0x813
No
32 lsbs of floating-point data register 1
FPU2
0x814
No
32 msb of floating-point data register 2
FPL2
0x815
No
32 lsbs of floating-point data register 2
FPU3
0x816
No
32 msbs of floating-point data register 3
FPL3
0x817
No
32 lsbs of floating-point data register 3
FPU4
0x818
No
32 msbs of floating-point data register 4
FPL4
0x819
No
32 lsbs of floating-point data register 4
FPU5
0x81A
No
32 msbs of floating-point data register 5
FPL5
0x81B
No
32 lsbs of floating-point data register 5
FPU6
0x81C
No
32 msbs of floating-point data register 6
FPL6
0x81D
No
32 lsbs of floating-point data register 6
FPU7
0x81E
No
32 msbs of floating-point data register 7
FPL7
0x81F
No
32 lsbs of floating-point data register 7
FPIAR
0x821
No
Floating-point instruction address register
FPSR
0x822
No
Floating-point status register
FPCR
0x824
No
Floating-point control register
Register Name
Local Memory and Module Control Registers
ROMBAR0
0xC00
Yes
ROM base address register 0
ROMBAR1
0xC01
Yes
ROM base address register 1
RAMBAR0
0xC04
Yes
RAM base address register 0
RAMBAR1
0xC05
Yes
RAM base address register 1
MPCR
0xC0C
Yes
Multiprocessor control register 1
EDRAMBAR
0xC0D
Yes
Embedded DRAM base address register 1
SECMBAR
0xC0E
Yes
Secondary module base address register 1
MBAR
0xC0F
Yes
Primary module base address register
Local Memory Address Permutation Control Registers 1
1-8
PCR1U0
0xD02
Yes
32 msbs of RAM 0 permutation control register 1
PCR1L0
0xD03
Yes
32 lsbs of RAM 0 permutation control register 1
PCR2U0
0xD04
Yes
32 msbs of RAM 0 permutation control register 2
PCR2L0
0xD05
Yes
32 lsbs of RAM 0 permutation control register 2
PCR3U0
0xD06
Yes
32 msbs of RAM 0 permutation control register 3
PCR3L0
0xD07
Yes
32 lsbs of RAM 0 permutation control register 3
PCR1U1
0xD0A
Yes
32 msbs of RAM 1 permutation control register 1
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Data Format Summary
Table 1-1. ColdFire CPU Space Assignments
Freescale Semiconductor, Inc...
1
Name
CPU Space
Assignment
Written with MOVEC
PCR1L1
0xD0B
Yes
32 lsbs of RAM 1 permutation control register 1
PCR2U1
0xD0C
Yes
32 msbs of RAM 1 permutation control register 2
PCR2L1
0xD0D
Yes
32 lsbs of RAM 1 permutation control register 2
PCR3U1
0xD0E
Yes
32 msbs of RAM 1 permutation control register 3
PCR3L1
0xD0F
Yes
32 lsbs of RAM 1 permutation control register 3
Register Name
Field definitions for these optional registers are implementation-specific
1.7 Data Format Summary
Table 1-2 lists the operand data formats. Integer operands can reside in registers, memory,
or instructions. The operand size is either explicitly encoded in the instruction or implicitly
defined by the instruction operation.
Table 1-2. Integer Data Formats
Operand Data Format
Size
Bit
1 bit
Byte integer
8 bits
Word integer
16 bits
Longword integer
32 bits
1.7.1 Data Organization in Registers
The following sections describe data organization in data, address, and control registers.
Section 4.2.2, “Floating-Point Data Formats,” describes floating-point formatting.
1.7.1.1 Integer Data Format Organization in Registers
Figure 1-4 shows the integer format for data registers. Each integer data register is 32 bits
wide. Byte and word operands occupy the lower 8- and 16-bit portions of integer data
registers, respectively. Longword operands occupy the entire 32 bits of integer data
registers. A data register that is either a source or destination operand only uses or changes
the appropriate lower 8 or 16 bits in byte or word operations, respectively. The remaining
high-order portion does not change. Note that the least-significant bit is bit 0 for all data
types, whereas the msbs for longword integer is bit 31, the msb of a word integer is bit 15,
and the msb of a byte integer is bit 7.
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-9
Freescale Semiconductor, Inc.
Data Format Summary
31
30
1
0
msb
lsb
31
8
Not used
31
6
1
Not used
15
msb
14
1
Byte (8 bits)
0
Lower-order word
30
msb
0
msb Lower-order byte lsb
16
31
7
Bit (0 ≤ bit number ≤ 31)
lsb
1
Word (16 bits)
0
Longword
lsb
Longword (32 bits)
Freescale Semiconductor, Inc...
Figure 1-4. Organization of Integer Data Format in Data Registers
Instruction encodings disallow use of address registers for byte operands. When an address
register is a source operand, either the low-order word or the entire longword operand is
used, depending on the operation size. Word-length source operands are sign-extended to
32 bits and then used in the operation with an address register destination. When an address
register is a destination, the entire register is affected, regardless of the operation size.
Figure 1-5 shows integer formats for address registers.
31
16
Sign-Extended
15
0
16-Bit Address Operand
31
0
Full 32-Bit Address Operand
Figure 1-5. Organization of Integer Data Formats in Address Registers
The size of control registers varies according to function. Some have undefined bits
reserved for future definition by Motorola. Those bits read as zeros and must be written as
zeros for future compatibility. Operations to the SR and CCR are word-sized. The upper
CCR byte is read as all zeros and is ignored when written, regardless of privilege mode.
1.7.1.2 Integer Data Format Organization in Memory
ColdFire processors use big-endian addressing. Byte-addressable memory organization
allows lower addresses to correspond to higher-order bytes. The address N of a longword
data item corresponds to the address of the high-order word. The lower-order word is at
address N + 2. The address of a word data item corresponds to the address of the high-order
byte. The lower-order byte is at address N + 1. This organization is shown in Figure 1-6.
1-10
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Data Format Summary
31
24 23
16 15
8
7
0
Longword 0x0000_0000
.
.
.
Word 0x0000_0000
Word 0x0000_0002
Byte 0x0000_0000
Byte 0x0000_0001
Byte 0x0000_0002
Byte 0x0000_0003
Longword 0x0000_0004
Word 0x0000_0004
Word 0x0000_0006
Byte 0x0000_0004
Byte 0x0000_0005
Byte 0x0000_0006
Byte 0x0000_0007
.
.
.
Freescale Semiconductor, Inc...
Word 0xFFFF_FFFC
Byte 0xFFFF_FFFC
Byte 0xFFFF_FFFD
.
.
.
Word 0xFFFF_FFFE
Byte 0xFFFF_FFFE
Byte 0xFFFF_FFFF
Figure 1-6. Memory Operand Addressing
1.7.2 EMAC Data Representation
The EMAC supports the following three modes, where each mode defines a unique operand
type.
•
•
•
Two’s complement signed integer: In this format, an N-bit operand value lies in the
range -2(N-1) < operand < 2(N-1) - 1. The binary point is right of the lsb.
Unsigned integer: In this format, an N-bit operand value lies in the range 0 < operand
< 2N - 1. The binary point is right of the lsb.
Two’s complement, signed fractional: In an N-bit number, the first bit is the sign bit.
The remaining bits signify the first N-1 bits after the binary point. Given an N-bit
number, aN-1aN-2aN-3... a2a1a0, its value is given by the equation in Figure 1-7.
N–2
value = – ( 1 ⋅ a N – 1 ) +
∑
2
(i + 1 – N)
⋅ ai
i=0
Figure 1-7. Two’s Complement, Signed Fractional Equation
This format can represent numbers in the range -1 < operand < 1 - 2(N-1).
For words and longwords, the largest negative number that can be represented is -1, whose
internal representation is 0x8000 and 0x8000_0000, respectively. The largest positive word
is 0x7FFF or (1 - 2-15); the most positive longword is 0x7FFF_FFFF or (1 - 2-31).
For more information, see Chapter 5, “Enhanced Multiply-Accumulate Unit (EMAC).”
1.7.2.1 Floating-Point Data Formats and Types
The FPU supports signed byte, word, and longword integer formats, which are identical to
those supported by the integer unit. The FPU also supports single- and double-precision
binary floating-point formats that fully comply with the IEEE-754 standard.
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-11
Freescale Semiconductor, Inc.
Addressing Modes
1.7.2.1.1 Signed-Integer Data Formats
The FPU supports 8-bit byte (B), 16-bit word (W), and 32-bit longword (L) integer data
formats.
1.7.2.1.2 Floating-Point Data Formats
Figure 1-8 shows the two binary floating-point data formats.
31
S
Freescale Semiconductor, Inc...
63
S
62
30
22
8-Bit Exponent
0
23-Bit Fraction
Sign of Mantissa
51
11-Bit Exponent
52-Bit Fraction
Single
0
Double
Sign of Mantissa
Figure 1-8. Floating-Point Data Formats
Note that, throughout this chapter, a mantissa is defined as the concatenation of an integer
bit, the binary point, and a fraction. A fraction is the term designating the bits to the right
of the binary point in the mantissa.
Mantissa
(integer bit).(fraction)
Figure 1-9. Mantissa
The integer bit is implied to be set for normalized numbers and infinities, clear for zeros
and denormalized numbers. For not-a-numbers (NANs), the integer bit is ignored. The
exponent in both floating-point formats is an unsigned binary integer with an implied bias
added to it. Subtracting the bias from exponent yields a signed, two’s complement power
of two. This represents the magnitude of a normalized floating-point number when
multiplied by the mantissa.
By definition, a normalized mantissa always takes values starting from 1.0 and going up to,
but not including, 2.0; that is, [1.0...2.0).
1.8 Addressing Modes
Addressing modes are categorized by how they are used. Data addressing modes refer to
data operands. Memory addressing modes refer to memory operands. Alterable addressing
modes refer to alterable (writable) data operands. Control addressing modes refer to
memory operands without an associated size.
These categories sometimes combine to form more restrictive categories. Two combined
classifications are alterable memory (both alterable and memory) and data alterable (both
1-12
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Set Overview
alterable and data). ColdFire microprocessors support 12 of the most commonly used
M68000 Family effective addressing modes. Table 1-3 summarizes these modes.
Table 1-3. ColdFire Effective Addressing Modes
Addressing Modes
Absolute data addressing
Short
Long
Address register indirect with scaled index
8-bit displacement
Freescale Semiconductor, Inc...
Immediate
Program counter indirect
with displacement
Program counter indirect with scaled index
8-bit displacement
Register direct
Data
Address
Register indirect
Address
Address with Postincrement
Address with Predecrement
Address with Displacement
Category
Mode
Field
Register
Field
(xxx).W
(xxx).L
111
111
000
001
X
X
X
X
X
X
—
—
(d8, An, Xi*SF)
110
register no.
X
X
X
X
#<xxx>
111
100
X
X
—
—
(d16, PC)
111
010
X
X
X
—
(d8, PC, Xi*SF)
111
011
X
X
X
—
Dn
An
000
001
register no.
register no.
X
—
—
—
—
—
X
X
(An)
(An)+
–(An)
(d16, An)
010
011
100
101
register no.
register no.
register no.
register no.
X
X
X
X
X
X
X
X
X
—
—
X
X
X
X
X
Syntax
Data Memory Control Alterable
1.9 Instruction Set Overview
The original ColdFire ISA was derived from M68000 Family opcodes based on extensive
analysis of embedded application code. After the first ColdFire compilers were created,
developers identified ISA additions that would enhance both code density and overall
performance. Additionally, as users implemented ColdFire-based designs into a wide range
of embedded systems, they identified frequently used instruction sequences that could be
improved by creating new instructions. This observation was especially prevalent in
environments that used substantial amounts of assembly language code.
The original ISA minimized support for instructions referencing byte and word operands.
MOVE.B and MOVE.W were fully supported; otherwise, only CLR (clear) and TST (test)
supported these data types. Based on input from compiler writers and system users, a set of
instruction enhancements was proposed to address the following:
•
•
Enhanced support for byte and word-sized operands through new move operations
Enhanced support for position-independent code
For descriptions of the ColdFire instruction set, see the latest version of the ColdFire
Programmer’s Reference Manual.
The following list summarizes new and enhanced instructions of ISA_B:
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-13
Freescale Semiconductor, Inc.
Instruction Set Overview
Freescale Semiconductor, Inc...
•
•
•
•
New instructions:
— INTOUCH loads blocks of instructions to be locked in the instruction cache.
— MOV3Q.L moves 3-bit immediate data to the destination location.
— MOVE to/from USP loads and stores user stack pointer.
— MVS.{B,W} sign-extends the source operand and moves it to the destination
register.
— MVZ.{B,W} zero-fills the source operand and moves it to the destination
register.
— SATS.L performs a saturation operation for signed arithmetic and updates the
destination register depending on CCR[V] and bit 31 of the register.
— TAS.B performs an indivisible read-modify-write cycle to test and set the
addressed memory byte.
Enhancements to existing Revision_A instructions:
— Longword support for branch instructions (Bcc, BRA, BSR)
— Byte and word support for compare instructions (CMP, CMPI)
— Word support for the compare address register instruction (CMPA)
— Byte and longword support for MOVE.x,where the source is immediate data and
the destination is specified by d16(Ax); that is, MOVE.{B,W} #<data>, d16(Ax)
Floating-point instructions. See Chapter 4, “Floating-Point Unit (FPU).”
EMAC instructions. See Chapter 5, “Enhanced Multiply-Accumulate Unit
(EMAC).”
Table 1-4 shows the syntax for the new and enhanced instructions. As Table 1-4 shows,
some ISA_B opcodes were defined in the M68K family and others are new.
Table 1-4. V4 New Instruction Summary
Instruction
Mnemonic1
Source
Destination
68K
ISA_B Extensions
Branch Always
bra.l
<label>
Yes
Branch Conditionally
bcc.l
<label>
Yes
Branch to Subroutine
bsr.l
<label>
Yes
Compare
cmp.{b,w,l}
<ea>y
Dx
Yes
cmpa.w
<ea>y
Ax
Yes
cmpi.{b,w}
#<data>
Dx
Yes
Instruction Fetch Touch
intouch
<Ay>
Move 3-Bit Data Quick
mov3q.l
#<data>
<ea>x
move.{b,w}
#<data>
d16(Ax)
Yes
Move from USP
move.l
USP
Ax
Yes
Move to USP
move.l
Ay
USP
Yes
Compare Address
Compare Immediate
Move Data Source to Destination
1-14
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Set Overview
Table 1-4. V4 New Instruction Summary (Continued)
Mnemonic1
Source
Destination
Move with Sign Extend
mvs.{b,w}
<ea>y
Dx
Move with Zero-Fill
mvz.{b,w}
<ea>y
Dx
Instruction
68K
Signed Saturate
sats.l
Dx
Test and Set an Operand
tas.b
<ea>x
Yes
Freescale Semiconductor, Inc...
EMAC Extensions
Move from an Accumulator and Clear
movclr.l
ACCx
Rx
No
Copy an Accumulator
move.l
ACCy
ACCx
No
Move from Accumulator 0 and 1 Extensions
move.l
ACCext01
Rx
No
Move from Accumulator 2 and 3 Extensions
move.l
ACCext23
Rx
No
Move to Accumulator 0 and 1 Extensions
move.l
Ry
ACCext01
No
Move to Accumulator 2 and 2 Extensions
move.l
Ry
ACCext23
No
FPU Instructions
Floating-Point Absolute Value
fabs.{b,w,l,s,d}
<ea>y
FPx
Yes
Floating-Point Add
fadd.{b,w,l,s,d}
<ea>y
FPx
Yes
<label>
Yes
Floating-Point Branch Conditionally
Floating-Point Compare
fbcc.{w,l}
fcmp.{b,w,l,s,d}
<ea>y
FPx
Yes
Floating-Point Divide
fdiv.{b,w,l,s,d}
<ea>y
FPx
Yes
Floating-Point Integer
fint.{b,w,l,s,d}
<ea>y
FPx
Yes
Floating-Point Integer Round-to-Zero
fintrz.{b,w,l,s,d}
<ea>y
FPx
Yes
Move Floating-Point Data Register
fmove.{b,w,l,s,d}
<ea>y
FPx
Yes
Move from FPCR
fmove.l
FPCR
<ea>x
Yes
Move from FPIAR
fmove.l
FPIAR
<ea>x
Yes
Move from FPSR
fmove.l
FPSR
<ea>x
Yes
Move from FPCR
fmove.l
<ea>y
FPCR
Yes
Move from FPIAR
fmove.l
<ea>y
FPIAR
Yes
Move from FPSR
fmove.l
<ea>y
FPSR
Yes
fmovem.d
#list
<ea>y
<ea>x
#list
Yes
Floating-Point Multiply
fmul.{b,w,l,s,d}
<ea>y
FPx
Yes
Floating-Point Negate
fneg.{b,w,l,s,d}
<ea>y
FPx
Yes
Move Multiple Floating Point Data Registers
Floating-Point No Operation
fnop
Restore Internal Floating Point State
Save Internal Floating Point State
frestore
Yes
<ea>y
fsave
Yes
<ea>x
Yes
Floating-Point Square Root
fsqrt.{b,w,l,s,d}
<ea>y
FPx
Yes
Floating-Point Subtract
fsub.{b,w,l,s,d}
<ea>y
FPx
Yes
Test Floating-Point Operand
ftst.{b,w,l,s,d}
<ea>y
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
Yes
1-15
Freescale Semiconductor, Inc.
Instruction Set Overview
1
Operand sizes in this column reflect only newly supported operand sizes for existing instructions (Bcc,
BRA, BSR, CMP, CMPA, CMPI, and MOVE)
1.9.1 Instruction Set Summary
Table 1-5 lists user-mode instructions by opcode.
Freescale Semiconductor, Inc...
Table 1-5. User-Mode Instruction Set Summary
Instruction
Operand Syntax
Operand Size
ADD
L
L
L
Source + Destination → Destination
ADDA
Dy,<ea>x
<ea>y,Dx
<ea>y,Ax
ADDI
ADDQ
#<data>,Dx
#<data>,<ea>x
L
L
Immediate Data + Destination → Destination
ADDX
Dy,Dx
L
Source + Destination + CCR[X] → Destination
AND
<ea>y,Dx
Dy,<ea>x
L
L
Source & Destination → Destination
ANDI
#<data>, Dx
L
Immediate Data & Destination → Destination
ASL
Dy,Dx
#<data>,Dx
L
L
CCR[X,C] ← (Dx << Dy) ← 0
CCR[X,C] ← (Dx << #<data>) ← 0
ASR
Dy,Dx
#<data>,Dx
L
L
msb → (Dx >> Dy) → CCR[X,C]
msb → (Dx >> #<data>) → CCR[X,C
Bcc
<label>
B, W, L
If Condition True, Then PC + dn → PC
BCHG
Dy,<ea>x
#<data>,<ea>x
B, L
B, L
~ (<bit number> of Destination) → CCR[Z] →
<bit number> of Destination
BCLR
Dy,<ea>x
#<data>,<ea>x
B, L
B, L
~ (<bit number> of Destination) → CCR[Z];
0 →<bit number> of Destination
BRA
<label>
B, W, L
BSET
Dy,<ea>x
#<data>,<ea>x
B, L
B, L
BSR
<label>
B, W, L
BTST
Dy,<ea>x
#<data>,<ea>x
B, L
B, L
CLR
<ea>x
B, W, L
0 → Destination
CMP
CMPA
<ea>y,Dx
<ea>y,Ax
B, W, L
W, L
Destination – Source → CCR
CMPI
#<data>,Dx
B, W, L
Destination – Immediate Data → CCR
DIVS/DIVU
<ea>y,Dx
W, L
Destination / Source → Destination
(Signed or Unsigned)
EOR
Dy,<ea>x
L
Source ^ Destination → Destination
EORI
#<data>,Dx
L
Immediate Data ^ Destination → Destination
EXT
Dx
Dx
Dx
B→W
W→L
B→L
EXTB
1-16
Operation
PC + dn → PC
~ (<bit number> of Destination) → CCR[Z];
1 → <bit number> of Destination
SP – 4 → SP; nextPC → (SP); PC + dn → PC
~ (<bit number> of Destination) → CCR[Z]
Sign-Extended Destination → Destination
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Set Overview
Freescale Semiconductor, Inc...
Table 1-5. User-Mode Instruction Set Summary (Continued)
Instruction
Operand Syntax
Operand Size
FABS
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
Absolute Value of Source → FPx
FADD
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
Source + FPx → FPx
FBcc
<label>
W, L
FCMP
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
FPx - Source
FDABS
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
Absolute Value of Source → FPx; round destination
to double
Absolute Value of FPx → FPx; round destination to
double
FDADD
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
Source + FPx → FPx; round destination to double
FDDIV
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
FPx / Source → FPx; round destination to double
FDIV
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
FPx / Source → FPx
FDMOVE
FPy,FPx
D
Source → Destination; round destination to double
FDMUL
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
Source * FPx → FPx; round destination to double
FDNEG
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
- (Source) → FPx; round destination to double
FDSQRT
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
Square Root of Source → FPx; round destination
to double
Square Root of FPx → FPx; round destination to
double
FDSUB
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
FPx - Source → FPx; round destination to double
FINT
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
Integer Part of Source → FPx
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
Integer Part of Source → FPx; round to zero
<ea>y,FPx
FPy,<ea>x
FPy,FPx
FPcr,<ea>x
<ea>y,FPcr
B,W,L,S,D
B,W,L,S,D
D
L
L
Source → Destination
FMOVEM
#list,<ea>x
<ea>y,#list
D
FMUL
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
FINTRZ
FMOVE
Operation
Absolute Value of FPx → FPx
If Condition True, Then PC + dn → PC
- (FPx) → FPx; round destination to double
Integer Part of FPx → FPx
Integer Part of FPx → FPx; round to zero
FPcr can be any floating point control register:
FPCR, FPIAR, FPSR
Listed registers → Destination
Source → Listed registers
Source * FPx → FPx
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-17
Freescale Semiconductor, Inc.
Instruction Set Overview
Freescale Semiconductor, Inc...
Table 1-5. User-Mode Instruction Set Summary (Continued)
Instruction
Operand Syntax
Operand Size
FNEG
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
FNOP
none
none
FSABS
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
Absolute Value of Source → FPx; round destination
to single
Absolute Value of FPx → FPx; round destination to
single
FSADD
<ea>y,FPx
FPy,FPx
B,W,L,S,D
Source + FPx → FPx; round destination to single
FSDIV
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
FPx / Source → FPx; round destination to single
FSMOVE
<ea>y,FPx
B,W,L,S,D
Source → Destination; round destination to single
FSMUL
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
Source * FPx → FPx; round destination to single
FSNEG
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
- (Source) → FPx; round destination to single
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
Square Root of Source → FPx
FSSQRT
<ea>y,FPx
FPy,FPx
FPx
B,W,L,S,D
D
D
Square Root of Source → FPx; round destination
to single
Square Root of FPx → FPx; round destination to
single
FSSUB
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
FPx - Source → FPx; round destination to single
FSUB
<ea>y,FPx
FPy,FPx
B,W,L,S,D
D
FPx - Source → FPx
FTST
<ea>y
B, W, L, S, D
ILLEGAL
none
none
SP – 4 → SP; PC → (SP) → PC; SP – 2 → SP;
SR → (SP); SP – 2 → SP; Vector Offset → (SP);
(VBR + 0x10) → PC
JMP
<ea>y
none
Source Address → PC
JSR
<ea>y
none
SP – 4 → SP; nextPC → (SP); Source → PC
LEA
<ea>y,Ax
L
<ea>y → Ax
LINK
Ay,#<displacement>
W
SP – 4 → SP; Ay → (SP); SP → Ay, SP + dn → SP
LSL
Dy,Dx
#<data>,Dx
L
L
CCR[X,C] ← (Dx << Dy) ← 0
CCR[X,C] ← (Dx << #<data>) ← 0
LSR
Dy,Dx
#<data>,Dx
L
L
0 → (Dx >> Dy) → CCR[X,C]
0 → (Dx >> #<data>) → CCR[X,C]
MAC
Ry,RxSF,ACCx
Ry,RxSF,<ea>y,Rw,A
CCx
W, L
W, L
FSQRT
1-18
Operation
- (Source) → FPx
- (FPx) → FPx
PC + 2 → PC (FPU Pipeline Synchronized)
- (FPx) → FPx; round destination to single
Square Root of FPx → FPx
Source Operand Tested → FPCC
ACCx + (Ry * Rx){<<|>>}SF → ACCx
ACCx + (Ry * Rx){<<|>>}SF → ACCx;
(<ea>y(&MASK)) → Rw
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Set Overview
Table 1-5. User-Mode Instruction Set Summary (Continued)
Instruction
Operand Syntax
Operand Size
MOV3Q
#<data>,<ea>x
L
Immediate Data → Destination
MOVCLR
ACCy,Rx
L
Accumulator → Destination, 0 → Accumulator
MOVE
<ea>y,<ea>x
MACcr,Dx
<ea>y,MACcr
CCR,Dx
<ea>y,CCR
B,W,L
L
L
W
W
Source → Destination
where MACcr can be any MAC control register:
ACCx, ACCext01, ACCext23, MACSR, MASK
MOVEA
<ea>y,Ax
W,L → L
MOVEM
#list,<ea>x
<ea>y,#list
L
Listed Registers → Destination
Source → Listed Registers
MOVEQ
#<data>,Dx
B→L
Immediate Data → Destination
MSAC
Ry,RxSF,ACCx
Ry,RxSF,<ea>y,Rw,A
CCx
W, L
W, L
MULS/MULU
<ea>y,Dx
W*W→L
L*L→L
MVS
<ea>y,Dx
B,W
Source with sign extension → Destination
MVZ
<ea>y,Dx
B,W
Source with zero fill → Destination
NEG
Dx
L
0 – Destination → Destination
NEGX
Dx
L
0 – Destination – CCR[X] → Destination
NOP
none
none
NOT
Dx
L
~ Destination → Destination
OR
<ea>y,Dx
Dy,<ea>x
L
L
Source | Destination → Destination
ORI
#<data>,Dx
L
Immediate Data | Destination → Destination
PEA
<ea>y
L
SP – 4 → SP; <ea>y → (SP)
PULSE
none
none
REMS/REMU
<ea>y,Dw:Dx
L
RTS
none
none
SATS
Dx
L
If CCR[V] == 1;
then if Dx[31] == 0;
then Dx[31:0] = 0x80000000;
else Dx[31:0] = 0x7FFFFFFF;
else Dx[31:0] is unchanged
Scc
Dx
B
If Condition True, Then 1s → Destination;
Else 0s → Destination
SUB
<ea>y,Dx
Dy,<ea>x
<ea>y,Ax
L
L
L
Destination - Source → Destination
Freescale Semiconductor, Inc...
MOVE from
CCR
MOVE to
CCR
SUBA
Operation
Source → Destination
ACCx - (Ry * Rx){<<|>>}SF → ACCx
ACCx - (Ry * Rx){<<|>>}SF → ACCx;
(<ea>y(&MASK)) → Rw
Source * Destination → Destination
(Signed or Unsigned)
PC + 2 → PC (Integer Pipeline Synchronized)
Set PST = 0x4
Destination / Source → Remainder
(Signed or Unsigned)
(SP) → PC; SP + 4 → SP
Chapter 1. Introduction
For More Information On This Product,
Go to: www.freescale.com
1-19
Freescale Semiconductor, Inc.
Instruction Set Overview
Freescale Semiconductor, Inc...
Table 1-5. User-Mode Instruction Set Summary (Continued)
Instruction
Operand Syntax
Operand Size
Operation
SUBI
SUBQ
#<data>,Dx
#<data>,<ea>x
L
L
Destination – Immediate Data → Destination
SUBX
Dy,Dx
L
Destination – Source – CCR[X] → Destination
SWAP
Dx
W
MSW of Dx ↔ LSW of Dx
TAS
<ea>x
B
Destination Tested → CCR;
1 → bit 7 of Destination
TPF
none
#<data>
#<data>
none
W
L
PC + 2→ PC
PC + 4 → PC
PC + 6→ PC
TRAP
#<vector>
none
1 → S Bit of SR; SP – 4 → SP; nextPC → (SP);
SP – 2 → SP; SR → (SP)
SP – 2 → SP; Format/Offset → (SP)
(VBR + 0x80 +4*n) → PC, where n is the TRAP
number
TST
<ea>y
B, W, L
UNLK
Ax
none
WDDATA
<ea>y
B, W, L
Source Operand Tested → CCR
Ax → SP; (SP) → Ax; SP + 4 → SP
Source → DDATA port
Table 1-6 describes supervisor-mode instructions.
Table 1-6. Supervisor-Mode Instruction Set Summary
Instruction
Operand Syntax
Operand Size
Operation
CPUSHL
ic,(Ax)
dc,(Ax)
bc,(Ax)
none
If data is valid and modified, push cache line; invalidate line
if programmed in CACR (synchronizes pipeline)
FRESTORE
<ea>y
none
FPU State Frame → Internal FPU State
FSAVE
<ea>x
none
Internal FPU State → FPU State Frame
HALT
none
none
Halt processor core
INTOUCH
Ay
none
Instruction fetch touch at (Ay)
MOVE from SR
SR,Dx
W
SR → Destination
MOVE from USP
USP,Dx
L
USP → Destination
MOVE to SR
<ea>y,SR
W
Source → SR; Dy or #<data> source only
MOVE to USP
Ay,USP
L
Source → USP
MOVEC
Ry,Rc
L
Ry → Rc
RTE
none
none
2 (SP) → SR; 4 (SP) → PC; SP + 8 →SP
Adjust stack according to format
STOP
#<data>
none
Immediate Data → SR; STOP
WDEBUG
<ea>y
L
1-20
Addressed Debug WDMREG Command Executed
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Chapter 2
Registers
Freescale Semiconductor, Inc...
This chapter describes the organization of CF4e general-purpose and control registers in the
user and supervisor programming models.
2.1 Overview
Figure 2-2 shows both the user and supervisor register sets, which are described in this
chapter.
Chapter 2. Registers
For More Information On This Product,
Go to: www.freescale.com
2-1
Freescale Semiconductor, Inc.
Overview
31
0
Data registers
A0
A1
A2
A3
A4
A5
A6
A7
PC
CCR
Address registers
User Registers
0
63
User stack pointer
Program counter
Condition code register
0
31
FP0
FP1
FP2
FP3
FP4
FP5
FP6
FP7
FPCR
FPSR
FPIAR
Floating-point data registers
MACSR
ACC0
ACC1
ACC2
ACC3
ACCext01
ACCext23
MASK
MAC status register
MAC accumulator 0
MAC accumulator 1 (EMAC only)
MAC accumulator 2 (EMAC only)
MAC accumulator 3 (EMAC only)
ACC0 and ACC1 extensions
ACC2 and ACC3 extensions
MAC mask register
Floating-point control register
Floating-point status register
Floating-point instruction address register
0
15
31
Supervisor Registers
Freescale Semiconductor, Inc...
31
D0
D1
D2
D3
D4
D5
D6
D7
0
(CCR) SR
OTHER_A7
Must be zeros VBR
CACR
ASID
ACR0
ACR1
ACR2
ACR3
MMUBAR
ROMBAR0
ROMBAR1
RAMBAR0
RAMBAR1
MBAR
19
Status register
Supervisor A7 stack pointer
Vector base register
Cache control register
Address space ID register
Access control register 0 (data)
Access control register 1 (data)
Access control register 2 (instruction)
Access control register 3 (instruction)
MMU base address register
ROM base address register 0
ROM base address register 1
RAM base address register0
RAM base address register 1
Module base address register (not a core register)
Figure 2-1. Programming Model
2-2
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
User Programming Model
2.2 User Programming Model
The user programming model, shown in Figure 2-2, consists of the following registers:
•
•
•
•
•
16 general-purpose 32-bit registers (D7–D0 and A7–A0); A7 is user stack pointer
32-bit program counter
8-bit condition code register
Registers to support the EMAC
Register to support the floating-point unit (FPU)
Freescale Semiconductor, Inc...
Section 1.7, “Data Format Summary,” describes formats for integer and floating-point data.
Chapter 2. Registers
For More Information On This Product,
Go to: www.freescale.com
2-3
Freescale Semiconductor, Inc.
User Programming Model
31
Freescale Semiconductor, Inc...
31
63
0
D0
D1
D2
D3
D4
D5
D6
D7
Data registers
A0
A1
A2
A3
A4
A5
A6
A7
PC
CCR
Address registers
0
User stack pointer
Program counter
Condition code register
0
31
FP0
FP1
FP2
FP3
FP4
FP5
FP6
FP7
FPCR
FPSR
FPIAR
Floating-point data registers
MACSR
ACC0
ACC1
ACC2
ACC3
ACCext01
ACCext23
MASK
MAC status register
MAC accumulator 0
MAC accumulator 1 (EMAC only)
MAC accumulator 2 (EMAC only)
MAC accumulator 3 (EMAC only)
ACC0 and ACC1 extensions
ACC2 and ACC3 extensions
MAC mask register
Floating-point control register
Floating-point status register
Floating-point instruction address register
0
Figure 2-2. User Programming Model
2.2.1 Data Registers (D7–D0)
D7–D0 are used as data registers for bit, byte (8-bit), word (16-bit), and longword (32-bit)
operations. They can also be used as index registers.
2.2.2 Address Registers (A6–A0)
A6–A0 can be used as software stack pointers, index registers, or base address registers and
can be used for word and longword operations.
2-4
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
User Programming Model
2.2.3 User Stack Pointer (A7)
The CF4e architecture supports two unique stack pointer (A7) registers—the supervisor
stack pointer (SSP) and the user stack pointer (USP). This support provides the required
isolation between operating modes as dictated by the virtual memory management scheme
provided by the memory management unit (MMU). The SSP is described in Section 2.3.3,
“Supervisor/User Stack Pointers (A7 and OTHER_A7).”
Freescale Semiconductor, Inc...
2.2.4 Program Counter (PC)
The PC holds the address of the executing instruction. For sequential instructions, the
processor automatically increments PC. When program flow changes, the PC is updated
with the target instruction. For some instructions, the PC specifies the base address for
PC-relative operand addressing modes. If two 16-bit instructions are dispatched together,
the PC is advanced by 4 bytes, so that it points to the next instruction after this pair.
2.2.5 Condition Code Register (CCR)
The CCR occupies SR[7–0], as shown in Figure 2-3. CCR[4–0] are indicator flags based
on results generated by arithmetic operations.
7
5
Field
—
Reset
000
R/W
R
4
3
2
1
0
X
N
Z
V
C
R/W
R/W
Undefined
R/W
R/W
R/W
Figure 2-3. Condition Code Register (CCR)
CCR fields are described in Table 2-1.
Table 2-1. CCR Field Descriptions
Bits
Name
Description
7–5
—
Reserved. These bits are read as 0; writes have no effect.
4
X
Extend condition code bit. Assigned the value of the carry bit for arithmetic operations; otherwise not
affected or set to a specified result. Also used as an input operand for multiple-precision arithmetic.
3
N
Negative condition code bit. Set if the msb of the result is set; otherwise cleared.
2
Z
Zero condition code bit. Set if the result equals zero; otherwise cleared.
1
V
Overflow condition code bit. Set if an arithmetic overflow occurs, implying that the result cannot be
represented in the operand size; otherwise cleared.
0
C
Carry condition code bit. Set if a carry-out of the data operand msb occurs for an addition or if a
borrow occurs in a subtraction; otherwise cleared.
Chapter 2. Registers
For More Information On This Product,
Go to: www.freescale.com
2-5
Freescale Semiconductor, Inc.
User Programming Model
2.2.6 EMAC Programming Model
The registers in the EMAC portion of the user programming model, described in
Section Chapter 5, “Enhanced Multiply-Accumulate Unit (EMAC),” include the following
registers:
The EMAC provides the following program-visible registers:
Freescale Semiconductor, Inc...
•
•
•
•
The EMAC programming model includes four 48-bit accumulator registers
partitioned as follows:
— Four 32-bit accumulators (ACC0–ACC3)
— Eight 8-bit accumulator extension bytes (two per accumulator). These are
grouped into two 32-bit values for load and store operations (ACCEXT01 and
ACCEXT23).
Accumulators and extension bytes can be loaded, copied, and stored, and results
from EMAC arithmetic operations generally affect the entire 48-bit destination.
Eight 8-bit accumulator extensions (two per accumulator), packaged as two 32-bit
values for load and store operations (ACCext01 and ACCext23)
One 16-bit mask register (MASK)
One 32-bit status register (MACSR) including four indicator bits signaling product
or accumulation overflow (one for each accumulator: PAV0–PAV3)
These registers are shown in Figure 2-4.
31
0
MACSR
ACC0
ACC1
ACC2
ACC3
ACCext01
ACCext23
MASK
MAC status register
MAC accumulator 0
MAC accumulator 1
MAC accumulator 2
MAC accumulator 3
Extensions for ACC0 and ACC1
Extensions for ACC2 and ACC3
MAC mask register
Figure 2-4. EMAC Register Set
2.2.7 Floating-Point Programming Model
Figure 2-5 shows the FPU programming model.
2-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Supervisor Programming Model
63
0
FP0
FP1
FP2
FP3
FP4
FP5
FP6
FP7
FPCR
FPSR
FPIAR
Floating-point data registers
Floating-point control register
Floating-point status register
Floating-point instruction address register
Freescale Semiconductor, Inc...
Figure 2-5. Floating-Point Programmer’s Model
The programmer’s model for the FPU consists of the following:
•
Eight 64-bit floating-point data registers (FP0–FP7)
•
One 32-bit floating-point control register (FPCR)
•
One 32-bit floating-point status register (FPSR)
•
One 32-bit floating-point instruction address register (FPIAR)
These registers are described in Section 4.3, “FPU Programmer’s Model.”
2.3 Supervisor Programming Model
Typically, system programmers use the supervisor programming model to implement
operating system functions and provide memory and I/O control. The CF4e supervisor
programming model provides access to user registers and additional supervisor registers,
which include the upper byte of the status register (SR), the supervisor stack pointer (SSP),
the vector base register (VBR), and registers for configuring attributes of the address space
connected to the processor core. Most supervisor-level registers are accessed by using the
MOVEC instruction with the control register definitions in Table 2-4.
Figure 2-6 shows the supervisor programming model.
Chapter 2. Registers
For More Information On This Product,
Go to: www.freescale.com
2-7
Freescale Semiconductor, Inc.
Supervisor Programming Model
15
Freescale Semiconductor, Inc...
0
(CCR) SR
OTHER_A7
Must be zeros VBR
CACR
ASID
ACR0
ACR1
ACR2
ACR3
MMUBAR
ROMBAR0
ROMBAR1
RAMBAR0
RAMBAR1
MBAR
31
19
Status register
Supervisor A7 stack pointer
Vector base register
Cache control register
Address space ID register
Access control register 0 (data)
Access control register 1 (data)
Access control register 2 (instruction)
Access control register 3 (instruction)
MMU base address register
ROM base address register 0
ROM base address register 1
RAM base address register0
RAM base address register 1
Module base address register (not a core register)
Figure 2-6. Supervisor Programming Model
2.3.1 Status Register (SR)
The SR stores the processor status, the interrupt priority mask, and other control bits.
Supervisor software can read or write the entire SR; user software can read or write only
SR[7–0], described in Section 2.2.5, “Condition Code Register (CCR).” Bits in the system
byte indicate processor states—trace mode (T), supervisor or user mode (S), and master or
interrupt state (M). SR is set to 0x27xx after reset.
15
14
13
12
12
10
8
7
5
System byte
4
3
2
1
0
Condition code register (CCR)
Field
T
—
S
M
—
I
—
X
N
Z
V
C
Reset
0
0
1
0
0
111
000
—
—
—
—
—
R
R/W
R/W
R
R/W
R
R/W
R/W
R/W
R/W
R/W
R/W R/W
Figure 2-7. Status Register (SR)
Table 2-2 describes SR fields.
Table 2-2. Status Field Descriptions
Bits
Name
15
T
Trace enable. When T is set, the processor performs a trace exception after every instruction.
13
S
Supervisor/user state. Indicates whether the processor is in supervisor or user mode
0 User mode
1 Supervisor mode
12
M
Master/interrupt state. Cleared by an interrupt exception. It can be set by software during execution
of the RTE or move to SR instructions so the OS can emulate an interrupt stack pointer.
2-8
Description
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Supervisor Programming Model
Table 2-2. Status Field Descriptions (Continued)
Bits
Name
10–8
I
7–0
CCR
Description
Interrupt priority mask. Defines the current interrupt priority. Interrupt requests are inhibited for all
priority levels less than or equal to the current priority, except the edge-sensitive level-7 request,
which cannot be masked.
Condition code register. See Table 2-3.
Freescale Semiconductor, Inc...
2.3.2 Vector Base Register (VBR)
The VBR holds the base address of the exception vector table in memory. The displacement
of an exception vector is added to the value in this register to access the vector table.
VBR[19–0] are not implemented and are assumed to be zero, forcing the vector table to be
aligned on a 0-modulo-1-Mbyte boundary.
31
Field
20 19
0
Exception vector table base address
Reset
—
All zeros
R/W Written from a BDM serial command or from the CPU using the MOVEC instruction. VBR can be read from
the debug module only. The upper 12 bits are returned, the low-order 20 bits are undefined.
Rc
0x801
Figure 2-8. Vector Base Register (VBR)
2.3.3 Supervisor/User Stack Pointers (A7 and OTHER_A7)
The CF4e architecture supports two independent stack pointer (A7) registers—the
supervisor stack pointer (SSP) and the user stack pointer (USP). This support provides the
required isolation between operating modes as dictated by the virtual memory management
scheme provided by the memory management unit (MMU).
The hardware implementation of these two programmable-visible 32-bit registers does not
uniquely identify one as the SSP and the other as the USP. Rather, the hardware uses one
32-bit register as the currently-active A7 and the other as OTHER_A7. Thus, the register
contents are a function of the processor operating mode, as shown in the following:
if SR[S] = 1
then
else
A7 = Supervisor Stack Pointer
other_A7 = User Stack Pointer
A7 = User Stack Pointer
other_A7 = Supervisor Stack Pointer
The BDM programming model supports reads and writes to A7 and OTHER_A7 directly.
It is the responsibility of the external development system to determine the mapping of (A7
and OTHER_A7) to the two program-visible definitions (SSP and USP), based on the
setting of SR[S]. This functionality is enabled by setting by the dual stack pointer enable
bit CACR[DSPE]. If this bit is cleared, only the stack pointer, A7, defined for previous
ColdFire versions is available. DSPE is zero at reset.
Chapter 2. Registers
For More Information On This Product,
Go to: www.freescale.com
2-9
Freescale Semiconductor, Inc.
Supervisor Programming Model
If DSPE is set, the appropriate stack pointer register (SSP or USP) is accessed as a function
of the processor’s operating mode. To support dual stack pointers, the following two
privileged MC680x0 instructions to load/store the USP are added to the ColdFire
instruction set architecture:
move.l Ay,USP # move to USP
move.l USP,Ax # move from USP
These instructions are described in the PRM.
Freescale Semiconductor, Inc...
2.3.4 Cache Control Register (CACR)
The CACR controls operation of the instruction, data, and branch cache memories. It
includes bits for enabling, freezing, and invalidating cache contents. It also includes bits for
defining the default cache mode and write-protect fields. The CACR is described in
Section 8.7.10.1, “Cache Control Register (CACR).”
2.3.5 Access Control Registers (ACR0–ACR3)
The access control registers, ACR0–ACR3 define attributes for four user-defined memory
regions. ACR0 and ACR1 control data memory space and ACR2 and ACR3 control
instruction memory space. Attributes include definition of cache mode, write protect and
buffer write enables. The ACRs are described in Section 8.7.10.2, “Access Control
Registers (ACR0–ACR3).”
2.3.6 RAM Base Address Registers (RAMBAR0/RAMBAR1)
RAMBAR registers are used to specify the base address of the internal RAM modules and
indicate the types of references mapped to each. Each RAMBAR includes a base address,
write-protect bit, address space mask bits, and an enable bit. RAM base address alignment
is implementation specific. See Section 8.5.2.1, “SRAM Base Address Registers
(RAMBAR0/RAMBAR1).”
2.3.7 ROM Base Address Registers (ROMBAR0/ROMBAR1)
ROMBAR registers determine the base address of the internal ROM modules and indicate
the types of references mapped to each. Each ROMBAR includes a base address,
write-protect bit, address space mask bits, and an enable bit. ROM base address alignment
is implementation specific. See Section 8.6.2.1, “ROM Base Address Registers
(ROMBAR0/ROMBAR1).”
2.3.8 Module Base Address Register (MBAR)
The supervisor-level MBAR, Figure 2-9, specifies the base address and allowable access
types for all internal peripherals. Note that the MBAR is not implemented in the core, but
2-10
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Supervisor Programming Model
it is included here because it must be implemented by the system designer. It is written with
a MOVEC instruction using the CPU address 0xC0F. (See the ColdFire Family
Programmer’s Reference Manual.) MBAR can be read or written through the debug
module as a read/write register, as described in Chapter 11, “Debug Support.” Only the
debug module can read MBAR.
Freescale Semiconductor, Inc...
The valid bit, MBAR[V], is cleared at system reset to prevent incorrect references before
MBAR is written; other MBAR bits are uninitialized at reset. To access internal peripherals,
write MBAR with the appropriate base address (BA) and set MBAR[V] after system reset.
All internal peripheral registers occupy a single relocatable memory block along 4-Kbyte
boundaries. If MBAR[V] is set, MBAR[BA] is compared to the upper 20 bits of the full
32-bit internal address to determine if an internal peripheral is being accessed. MBAR
masks specific address spaces using the address space fields. Attempts to access a masked
address space generate an external bus access.
Addresses hitting overlapping memory spaces take the following priority:
1. MBAR
2. RAM, ROM, and caches
3. Chip select
NOTE:
The MBAR region must be mapped to non-cacheable space.
Attribute Mask Bits
31
Field
12 11
BA
Reset
9
—
8
7
6
5
4
3
2
0
WP — AM C/I SC SD UC UD V
Undefined
R/W
1
0
W (supervisor only); R/W through debug module (only the debug module can read MBAR)
Rc
0x0C0F
Figure 2-9. Module Base Address Register (MBAR)
Table 2-3 describes MBAR fields.
Table 2-3. MBAR Field Descriptions
Bits
Field
Description
31–12
BA
Base address. Defines the base address for a 4-Kbyte address range.
11–9
—
Reserved, should be cleared.
8
WP
7
—
Write protect. Mask bit for write cycles in the MBAR-mapped register address range.
0 Module address range is read/write.
1 Module address range is read only.
Reserved, should be cleared.
Chapter 2. Registers
For More Information On This Product,
Go to: www.freescale.com
2-11
Freescale Semiconductor, Inc.
Programming Model Table
Freescale Semiconductor, Inc...
Table 2-3. MBAR Field Descriptions (Continued)
Bits
Field
Description
6
AM
Alternate master mask. When AM = 0 and an alternate master (external master or DMA) accesses
MBAR-mapped registers, MBAR[SC,SD,UC,UD] are ignored in address decoding. These fields
mask address space, placing the MBAR-mapped register in a specific address space or spaces.
5
C/I
Mask CPU space and interrupt acknowledge cycles. Note that C/I must be set if BA = 0.
0 Activates the corresponding MBAR-mapped register
1 Regular external bus access
4
SC
Setting masks supervisor code space in MBAR address range
3
SD
Setting masks supervisor data space in MBAR address range
2
UC
Setting masks user code space in MBAR address range
1
UD
Setting masks user data space in MBAR address range
0
V
Valid. Determines whether MBAR settings are valid.
0 MBAR contents are invalid.
1 MBAR contents are valid.
The following example shows how to set the MBAR to location 0x1000_0000 using the D0
register. Setting MBAR[V] validates the MBAR location. This example assumes all
accesses are valid:
move.1 #0x10000001,D0
movec DO,MBAR
2.4 Programming Model Table
Table 2-4 lists register names, the CPU space location, whether the register is written from
the processor using the MOVEC instruction, and the complete register name.
Table 2-4. ColdFire CPU Registers
Name
CPU Space (Rc)
Written with MOVEC
Register Name
Memory Management Control Registers
CACR
0x002
Yes
Cache control register
ASID
0x003
Yes
Address space identifier
ACR0–ACR3
0x004–0x007
Yes
Access control registers 0–3
MMUBAR
0x008
Yes
MMU base address register
Processor General-Purpose Registers
2-12
D0–D7
0x(0,1)80–0x(0,1)87
No
Data registers 0–7 (0 = load, 1 = store)
A0–A7
0x(0,1)88–0x(0,1)8F
No
Address registers 0–7 (0 = load, 1 = store)
A7 is user stack pointer
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model Table
Table 2-4. ColdFire CPU Registers (Continued)
Name
CPU Space (Rc)
Written with MOVEC
Register Name
Freescale Semiconductor, Inc...
Processor Miscellaneous Registers
OTHER_A7
0x800
No
Other stack pointer
VBR
0x801
Yes
Vector base register
MACSR
0x804
No
MAC status register
MASK
0x805
No
MAC address mask register
ACC0–ACC3
0x806–0x80B
No
MAC accumulators 0–3
ACCext01
0x807
No
MAC accumulator 0, 1 extension bytes
ACCext23
0x808
No
MAC accumulator 2, 3 extension bytes
SR
0x80E
No
Status register
PC
0x80F
Yes
Program counter
Processor Floating-Point Registers
FPU0
0x810
No
32 msbs of floating-point data register 0
FPL0
0x811
No
32 lsbs of floating-point data register 0
FPU1
0x812
No
32 msbs of floating-point data register 1
FPL1
0x813
No
32 lsbs of floating-point data register 1
FPU2
0x814
No
32 msbs of floating-point data register 2
FPL2
0x815
No
32 lsbs of floating-point data register 2
FPU3
0x816
No
32 msbs of floating-point data register 3
FPL3
0x817
No
32 lsbs of floating-point data register 3
FPU4
0x818
No
32 msbs of floating-point data register 4
FPL4
0x819
No
32 lsbs of floating-point data register 4
FPU5
0x81A
No
32 msbs of floating-point data register 5
FPL5
0x81B
No
32 lsbs of floating-point data register 5
FPU6
0x81C
No
32 msbs of floating-point data register 6
FPL6
0x81D
No
32 lsbs of floating-point data register 6
FPU7
0x81E
No
32 msbs of floating-point data register 7
FPL7
0x81F
No
32 lsbs of floating-point data register 7
FPIAR
0x821
No
Floating-point instruction address register
FPSR
0x822
No
Floating-point status register
FPCR
0x824
No
Floating-point control register
Chapter 2. Registers
For More Information On This Product,
Go to: www.freescale.com
2-13
Freescale Semiconductor, Inc.
Programming Model Table
Table 2-4. ColdFire CPU Registers (Continued)
Name
CPU Space (Rc)
Written with MOVEC
Register Name
Local Memory and Module Control Registers
0xC00
Yes
ROM base address register 0
ROMBAR1
0xC01
Yes
ROM base address register 1
RAMBAR0
0xC04
Yes
RAM base address register 0
RAMBAR1
0xC05
Yes
RAM base address register 1
MBAR
0xC0F
Yes
Primary module base address register (not a
core register)
Freescale Semiconductor, Inc...
ROMBAR0
2-14
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 3
Instructions
The ColdFire Family Programmer’s Reference Manual, or PRM, describes ColdFire
instructions, addressing modes, and data formats for all ColdFire processors as well as
optional modules such as the FPU and EMAC.
Chapter 3. Instructions
For More Information On This Product,
Go to: www.freescale.com
3-1
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
3-2
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 4
Floating-Point Unit (FPU)
This chapter describes instructions implemented in the floating-point unit (FPU) designed
for use with the ColdFire family of microprocessors. The FPU conforms to the American
National Standards Institute (ANSI)/Institute of Electrical and Electronics Engineers
(IEEE) Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Standard 754).
The FPU does not support all IEEE-754 number types and operations in hardware; the
hardware unit is optimized for real-time execution with exceptions disabled and default
results provided for specific operations, operands, and number types. Exceptions can be
enabled to support these cases in software.
4.1 FPU Overview
The FPU operates on 64-bit, double-precision floating-point data and supports
single-precision and signed integer input operands. It can be used with ColdFire
microarchitecture, Version 4 and higher. The FPU programming model is like that in the
MC68060 microprocessor. The FPU is intended to accelerate the performance of certain
classes of embedded applications, especially those requiring high-speed floating-point
arithmetic computations. See Section 4.4.3, “Key Differences between ColdFire and
MC680x0 FPU Programming Models.”
The FPU appears as another execute engine at the bottom stages of the operand execution
pipeline (OEP), using operands from a dual-ported register file.
Setting bit 4 in the cache control register (CACR[DF]) disables the FPU. If CACR[DF] is
cleared, all FPU instructions are issued and executed, otherwise the processor responds
with a line F instruction exception (vector 11).
Operating systems often assume user applications are integer-only (to minimize the time
required by save context) by setting CACR[DF] at process initiation. If the application
includes floating-point instructions, the attempted execution of the first FP instruction
generates the line F exception, which signals the kernel that the FPU registers must be
included in the context for the application. The application then continues execution with
CACR[DF] cleared to enable FPU execution.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-1
Freescale Semiconductor, Inc.
FPU Overview
4.1.1 Notational Conventions
Table 4-1 defines notational conventions used in this chapter. Table 4-2 describes
addressing modes and syntax for floating-point instructions.
Table 4-1. Notational Conventions
Symbol
Description
Freescale Semiconductor, Inc...
Single- and Double-Precision Operand Operations
+
Arithmetic addition or postincrement indicator
−
Arithmetic subtraction or predecrement indicator
×
Arithmetic multiplication
÷
Arithmetic division or conjunction symbol
∼
Invert, operand is logically complemented. An overbar, , is also used for this operation.
&
Logical AND
|
Logical OR
→
<op>
<operand>tested
sign-extended
Source operand is moved to destination operand
Any double-operand operation
Operand is compared to zero and the condition codes are set appropriately
All bits of the upper portion are made equal to the high-order bit of the lower portion
Other Operations
If <condition>
then <operations>
else <operations>
Test the condition. If true, the operations after then are performed. If the condition is false and
the optional else clause is present, the operations after else are performed. If the condition is
false and else is omitted, the instruction performs no operation. Refer to the Bcc instruction
description as an example.
Register Specifications
An
Address register n (example: A3 is address register 3)
Ay, Ax
Source and destination address registers, respectively
Dn
Dy,Dx
Source and destination data registers, respectively
FPCR
Floating-point control register
FPIAR
Floating-point instruction address register
FPn
FPSR
FPy,FPx
4-2
Data register n (example: D3 is data register 3)
Floating-point data register n (example: FP3 is FPU data register 3)
Floating-point status register
Source and destination floating-point data registers, respectively
PC
Program counter
Rn
Address or data register
Rx
Destination register
Ry
Source register
Xi
Index register
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Operand Data Formats and Types
Table 4-2 lists floating-point addressing modes.
Table 4-2. Floating-Point Addressing Modes
Addressing Modes
Syntax
Freescale Semiconductor, Inc...
Register direct
Address register direct
Address register direct
Dy
Ay
Register indirect
Address register indirect
Address register indirect with postincrement
Address register indirect with predecrement
Address register indirect with displacement
(Ay)
–(Ay)
(d16,Ay)
Program counter indirect with displacement
(d16,PC)
4.2 Operand Data Formats and Types
The FPU supports signed byte, word, and longword integer formats, which are identical to
those supported by the integer unit. The FPU also supports single- and double-precision
binary floating-point formats that fully comply with the IEEE-754 standard.
4.2.1 Signed-integer Data Formats
The FPU supports 8-bit byte (B), 16-bit word (W), and 32-bit longword (L) integer data
formats.
4.2.2 Floating-Point Data Formats
Figure 4-1 shows the two binary floating-point data formats.
31
S
63
S
62
51
11-Bit Exponent
30
22
8-Bit Exponent
0
23-Bit Fraction
Sign of Mantissa
52-Bit Fraction
Single
0
Double
Sign of Mantissa
Figure 4-1. Floating-Point Data Formats
Note that, throughout this chapter, a mantissa is defined as the concatenation of an integer
bit, the binary point, and a fraction. A fraction is the term designating the bits to the right
of the binary point in the mantissa.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-3
Freescale Semiconductor, Inc.
Operand Data Formats and Types
Mantissa
(integer bit).(fraction)
Figure 4-2. Mantissa
Freescale Semiconductor, Inc...
The integer bit is implied to be set for normalized numbers and infinities, clear for zeros
and denormalized numbers. For not-a-numbers (NANs), the integer bit is ignored. The
exponent in both floating-point formats is an unsigned binary integer with an implied bias
added to it. Subtracting the bias from exponent yields a signed, two’s complement power
of two. This represents the magnitude of a normalized floating-point number when
multiplied by the mantissa.
By definition, a normalized mantissa always takes values starting from 1.0 and going up
to, but not including, 2.0; that is, [1.0...2.0).
4.2.3 Floating-Point Data Types
Each floating-point data format supports five unique data types: normalized numbers,
zeros, infinities, NANs, and denormalized numbers. The normalized data type, Figure 4-3,
never uses the maximum or minimum exponent value for a given format.
4.2.3.1 Normalized Numbers
Normalized numbers include all positive or negative numbers with exponents between the
maximum and minimum values. For single- and double-precision normalized numbers,
the implied integer bit is one and the exponent can be zero.
Min < Exponent < Max
Fraction = Any bit pattern
Sign of Mantissa, 0 or 1
Figure 4-3. Normalized Number Format
4.2.3.2 Zeros
Zeros can be positive or negative and represent real values, + 0.0 and – 0.0. See
Figure 4-4.
Exponent = 0
Fraction = 0
Sign of Mantissa, 0 or 1
Figure 4-4. Zero Format
4-4
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Operand Data Formats and Types
4.2.3.3 Infinities
Infinities can be positive or negative and represent real values that exceed the overflow
threshold. A result’s exponent greater than or equal to the maximum exponent value
indicates an overflow for a given data format and operation. This overflow description
ignores the effects of rounding and the user-selectable rounding models. For single- and
double-precision infinities, the fraction is a zero. See Figure 4-5.
Exponent = Maximum
Fraction = 0
Sign of Mantissa, 0 or 1
Freescale Semiconductor, Inc...
Figure 4-5. Infinity Format
4.2.3.4 Not-A-Number
When created by the FPU, NANs represent the results of operations having no
mathematical interpretation, such as infinity divided by infinity. Operations using a NAN
operand as an input return a NAN result. User-created NANs can protect against
uninitialized variables and arrays or can represent user-defined data types. See Figure 4-6.
Exponent = Maximum
Fraction = Any nonzero bit pattern
Sign of Mantissa, 0 or 1
Figure 4-6. Not-a-Number Format
If an input operand to an operation is a NAN, the result is an FPU-created default NAN.
When the FPU creates a NAN, the NAN always contains the same bit pattern in the
mantissa: all mantissa bits are ones and the sign bit is zero. When the user creates a NAN,
any nonzero bit pattern can be stored in the mantissa and the sign bit.
4.2.3.5 Denormalized Numbers
Denormalized numbers represent real values near the underflow threshold. Denormalized
numbers can be positive or negative. For denormalized numbers in single- and
double-precision, the implied integer bit is a zero. See Figure 4-7.
Exponent = 0
Fraction = Any nonzero bit pattern
Sign of Mantissa, 0 or 1
Figure 4-7. Denormalized Number Format
Traditionally, the detection of underflow causes floating-point number systems to perform
a flush-to-zero. The IEEE-754 standard implements gradual underflow: the result mantissa
is shifted right (denormalized) while the result exponent is incremented until reaching the
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-5
Freescale Semiconductor, Inc.
Operand Data Formats and Types
minimum value. If all the mantissa bits of the result are shifted off to the right during this
denormalization, the result becomes zero.
Denormalized numbers are not supported directly in the hardware of this implementation
but can be handled in software if needed (software for the input denorm exception could
be written to handle denormalized input operands, and software for the underflow
exception could create denormalized numbers). If the input denorm exception is disabled,
all denormalized numbers are treated as zeros.
Table 4-3 summarizes the data type specifications for byte, word, longword, single- and
double-precision data formats.
Freescale Semiconductor, Inc...
Table 4-3. Real Format Summary
Parameter
Single-Precision
3130
Data Format
s
23 22
Double-Precision
0
e
f
6362
s
52 51
e
0
f
Field Size in Bits
Sign (s)
1
1
Biased exponent (e)
8
11
Fraction (f)
23
52
Total
32
64
Interpretation of Sign
Positive fraction
s=0
s=0
Negative fraction
s=1
s=1
Normalized Numbers
Bias of biased exponent
Range of biased exponent
Range of fraction
+127 (0x7F)
+1023 (0x3FF)
0 < e < 255 (0xFF)
0 < e < 2047 (0x7FF)
Zero or Nonzero
Zero or Nonzero
1.f
1.f
(–1)s × 2e–127 × 1.f
(–1)s × 2e–1023 × 1.f
Mantissa
Relation to representation of real numbers
Denormalized Numbers
Biased exponent format minimum
Bias of biased exponent
Range of fraction
0 (0x00)
0 (0x000)
+126 (0x7E)
+1022 (0x3FE)
Nonzero
Nonzero
Mantissa
0.f
Relation to representation of real numbers
(–1)s
×
2–126
0.f
× 0.f
(–1)s
× 2–1022 × 0.f
Signed Zeros
Biased exponent format minimum
0 (0x00)
0 (0x00)
Mantissa
0.f = 0.0
0.f = 0.0
4-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Table 4-3. Real Format Summary (Continued)
Parameter
Single-Precision
Double-Precision
Signed Infinities
Biased exponent format maximum
Mantissa
255 (0xFF)
2047 (0x7FF)
0.f = 0.0
0.f = 0.0
NANs
Sign
Don’t Care
0 or 1
Biased exponent format maximum
255 (0xFF)
2047 (0x7FF)
Nonzero
Nonzero
xxxxx…xxxx
11111…1111
xxxxx…xxxx
11111…1111
Freescale Semiconductor, Inc...
Fraction
Representation of Fraction
Nonzero Bit Pattern Created by User
Fraction When Created by FPU
Approximate Ranges
Maximum Positive Normalized
3.4 × 1038
1.8 x 10308
Minimum Positive Normalized
1.2 × 10–38
2.2 x 10–308
Minimum Positive Denormalized
1.4 × 10–45
4.9 x 10–324
4.3 FPU Programmer’s Model
The programmer’s model for the FPU consists of the following:
•
•
•
•
Eight 64-bit floating-point data registers (FP0–FP7)
One 32-bit floating-point control register (FPCR)
One 32-bit floating-point status register (FPSR)
One 32-bit floating-point instruction address register (FPIAR)
Figure 4-8 shows the FPU programming model.
63
0
FP0
FP1
FP2
FP3
FP4
FP5
FP6
FP7
FPCR
FPSR
FPIAR
Floating-point data registers
Floating-point control register
Floating-point status register
Floating-point instruction address register
Figure 4-8. Floating-Point Programmer’s Model
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-7
Freescale Semiconductor, Inc.
FPU Programmer’s Model
4.3.1 Floating-Point Data Registers (FP0–FP7)
Floating-point data registers are analogous to the integer data registers for the
68K/ColdFire family. They always contain numbers in double-precision format, even
though the operand may be a single-precision value used in a single-precision calculation.
All external operands, regardless of the source data format, are converted to
double-precision format before being used in any calculation or being stored in a
floating-point data register. A reset or a null-restore operation sets FP0–FP7 to positive,
nonsignaling NANs.
Freescale Semiconductor, Inc...
4.3.1.1 Floating-Point Control Register (FPCR)
The FPCR, Figure 4-9, contains an exception enable byte (EE) and a mode control byte
(MC). The user can read or write to FPCR using FMOVE or FRESTORE. A processor
reset or a restore operation of the null state clears the FPCR. When this register is cleared,
the FPU never generates exceptions.
Exception Enable Byte (EE)
31
Field
16
—
15
14
13
12
11
10
Mode Control Byte (MC)
9
8
7
6
5
4 3
BSUN INAN OPERR OVFL UNFL DZ INEX IDE — PREC RND
Reset
0
—
All zeros
R/W
R/W
Figure 4-9. Floating-Point Control Register (FPCR)
Table 4-4 describes FPCR fields.
Table 4-4. FPCR Field Descriptions
Bits
Field
Description
31–16
—
Reserved, should be cleared.
15–8
EE
Exception enable byte. Each EE bit corresponds to a floating-point exception class. The user can
separately enable traps for each class of floating-point exceptions.
15
BSUN
Branch set on unordered
14
INAN
Input not-a-number
13
OPERR Operand error
12
OVFL
Overflow
11
UNFL
Underflow
10
DZ
9
INEX
8
IDE
4-8
Divide by zero
Inexact operation
Input denormalized
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Freescale Semiconductor, Inc...
Table 4-4. FPCR Field Descriptions (Continued)
Bits
Field
7–0
MC
Description
Mode control byte. Control FPU operating modes.
7
—
6
PREC
5–4
RND
3–0
—
Reserved, should be cleared.
Rounding precision
0 Double (D)
1 Single (S)
Rounding mode
00 To nearest (RN)
01 To zero (RZ)
10 To minus infinity (RM)
11 To plus infinity (RP)
Reserved, should be cleared.
4.3.2 Floating-Point Status Register (FPSR)
The FPSR, Figure 4-10, contains a floating-point condition code byte (FPCC), a
floating-point exception status byte (EXC), and a floating-point accrued exception byte
(AEXC). The user can read or write all FPSR bits. Execution of most floating-point
instructions modifies FPSR. FPSR is loaded using FMOVE or FRESTORE. A processor
reset or a restore operation of the null state clears the FPSR.
FPCC
Exception Status Byte (EXC)
31 28 27 26 25
Field
—
N Z
I
24
NAN
23
16
15
—
14
13
12
11
10
AEXC Byte
9
8
7
6
5
4
3
BSUN INAN OPERR OVFL UNFL DZ INEX IDE IOP OVFL UNFL DZ INEX —
Reset
All zeros
R/W
R/W
Figure 4-10. Floating-Point Status Register (FPSR)
Table 4-5 describes FPSR fields.
Table 4-5. FPSR Field Descriptions
Bits
Field
Description
31–24
FPCC
Floating-point condition code byte. Contains four condition code bits that are set after completion
of all arithmetic instructions involving the floating-point data registers. The floating-point store
operation, FMOVEM, and move system control register instructions do not affect the FPCC.
31–28
Reserved, should be cleared.
27
N
Negative
26
Z
Zero
25
I
Infinity
24
NAN
23–16
—
2 0
Not-a-number
Reserved, should be cleared.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-9
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Table 4-5. FPSR Field Descriptions (Continued)
Bits
Field
Description
15–8
EXC
Exception status byte. Contains a bit for each floating-point exception that might have occurred
during the most recent arithmetic instruction or move operation. This byte is cleared at the start of
all operations that generate floating-point exceptions (except FBcc only affects BSUN and that
only for nonaware tests). Operations that do not generate floating-point exceptions do not clear
this byte. An exception handler can use this byte to determine which floating-point exception or
exceptions caused a trap. The equations below the table show the comparative relationship
between the EXC byte and AEXC byte.
15
BSUN
Branch/set on unordered
14
INAN
Input not-a-number
Freescale Semiconductor, Inc...
13
OPERR Operand error
12
OVFL
Overflow
11
UNFL
Underflow
10
DZ
Divide by zero
9
INEX
Inexact result
8
IDE
7–0
AEXC
Input is denormalized
Accrued exception byte. Contains 5 required bits for IEEE-754 exception-disabled operations.
These exceptions are logical combinations of EXC bits. AEXC records all floating-point
exceptions since AEXC was last cleared, either by writing to FPSR or as a result of reset or a
restore operation of the null state.
Many users disable traps for some or all floating-point exception classes. AEXC eliminates the
need to poll EXC after each floating-point instruction. At the end of arithmetic operations, EXC
bits are logically combined to form an AEXC value that is logically ORed into the existing AEXC
byte (FBcc only updates IOP). This operation creates sticky floating-point exception bits in AEXC
that the user can poll only at the end of a series of floating-point operations. A sticky bit is one that
remains set until the user clears it.
Setting or clearing AEXC bits neither causes nor prevents an exception. The equations below the
table show relationships between EXC and AEXC. Comparing the current value of an AEXC bit
with a combination of EXC bits derives a new value in the corresponding AEXC bit. These
boolean equations apply to setting AEXC bits at the end of each operation affecting AEXC.
7
IOP
6
OVFL
Overflow
5
UNFL
Underflow
4
DZ
Divide by zero
3
INEX
Inexact result
2–0
—
Invalid operation
Reserved, should be cleared.
For AEXC[OVFL], AEXC[DZ], and AEXC[INEX], the next value is determined by
ORing the current AEXC value with the EXC equivalent, as shown in the following:
•
•
•
Next AEXC[OVFL] = Current AEXC[OVFL] | EXC[OVFL]
Next AEXC[DZ] = Current AEXC[DZ] | EXC[DZ]
Next AEXC[INEX] = Current AEXC[INEX] | EXC[INEX]
For AEXC[IOP] and AEXC[UNFL], the next value is calculated by ORing the current
AEXC value with EXC bit combinations, as follows:
4-10
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
•
•
Next AEXC[IOP] = Current AEXC[IOP] | EXC[BSUN | INAN | OPERR]
Next AEXC[UNFL] = Current AEXC[UNFL] | EXC[UNFL & INEX]
4.3.3 Floating-Point Instruction Address Register (FPIAR)
Freescale Semiconductor, Inc...
The ColdFire OEP can execute integer and floating-point instructions simultaneously. As a
result, the PC value stacked by the processor in response to a floating-point exception trap
may not point to the instruction that caused the exception.
For FPU instructions that can generate exception traps, the 32-bit FPIAR is loaded with
the instruction PC address before the FPU begins execution. In case of an FPU exception,
the trap handler can use the FPIAR contents to determine the instruction that generated the
exception. FMOVE to/from FPCR, FPSR, or FPIAR and FMOVEM instructions cannot
generate floating-point exceptions and so do not modify FPIAR. A reset or a null-restore
operation clears FPIAR.
4.3.4 Floating-Point Computational Accuracy
The FPU performs all floating-point internal operations in double-precision. It supports
mixed-mode arithmetic by converting single-precision operands to double-precision
values before performing the specified operation. The FPU converts all memory data
formats to the double-precision data format and stores the value in a floating-point register
or uses it as the source operand for an arithmetic operation. When moving a
double-precision floating-point value from a floating-point data register, the FPU can
convert the data depending on the destination, as follows:
•
•
Valid data formats for memory destination: B, W, L, S, or D
Valid data formats for integer data register destinations: B, W, L, or S
Normally if the input operand is a denormalized number, the number must be normalized
before an FPU instruction can be executed. A denormalized input operand is converted to
zero if the input denorm exception (IDE) is disabled. If IDE is enabled, the floating-point
engine traps to allow software action to be taken by the handler.
4.3.4.1 Intermediate Result
All FPU calculations use an intermediate result. When the FPU performs any operation,
the calculation is carried out using double-precision inputs, and the intermediate result is
calculated as if to produce infinite precision. After the calculation is complete, any
necessary rounding of the intermediate result for the selected precision is performed and
the result and stored in the destination.
Figure 4-11 shows the intermediate result format. The intermediate result’s exponent for
some dyadic operations (for example, multiply and divide) can easily overflow or
underflow the 11-bit exponent of the designation floating-point register. To simplify
overflow and underflow detection, intermediate results in the FPU maintain a 12-bit two’s
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-11
Freescale Semiconductor, Inc.
FPU Programmer’s Model
complement, integer exponent. Detection of an intermediate result overflow or underflow
always converts the 12-bit exponent into a 11-bit biased exponent before being stored in a
floating-point data register. The FPU internally maintains a 56-bit mantissa for rounding
purposes. The mantissa is always rounded to 53 bits (or fewer, depending on the selected
rounding precision) before it is stored in a floating-point data register.
56-Bit Intermediate Mantissa
12-Bit Exponent
52-Bit Fraction
Integer
lsb
Guard
Freescale Semiconductor, Inc...
Round
Sticky
Figure 4-11. Intermediate Result Format
If the destination is a floating-point data register, the result is in double-precision format
but may be rounded to single-precision, if required by the rounding precision, before
being stored. If the single-precision mode is selected, the exponent value is in the correct
range even if it is stored in double-precision format. If the destination is a memory
location or an integer data register, rounding precision is ignored. In this case, a number in
the double-precision format is taken from the source floating-point data register, rounded
to the destination format precision, and then written to memory or the integer data register.
Depending on the selected rounding mode or destination data format, the location of the
lsb of the mantissa and the locations of the guard, round, and sticky bits in the 56-bit
intermediate result mantissa vary. Guard and round bits are calculated exactly. The sticky
bit creates the illusion of an infinitely wide intermediate result. As the arrow in
Figure 4-11 shows, the sticky bit is the logical OR of all bits to the right of the round bit in
the infinitely precise result. During calculation, nonzero bits generated to the right of the
round bit set the sticky bit. Because of the sticky bit, the rounded intermediate result for all
required IEEE arithmetic operations in RN mode can err by no more than one half unit in
the last place.
4.3.4.2 Rounding the Result
The FPU supports the four rounding modes specified by the IEEE-754 standard:
round-to-nearest (RN), round-toward-zero (RZ), round-toward-plus-infinity (RP), and
round-toward-minus-infinity (RM). The RM and RP modes are often referred to as
directed-rounding-modes and are useful in interval arithmetic. Rounding is accomplished
through the intermediate result. Single-precision results are rounded to a 24-bit mantissa
boundary; double-precision results are rounded to a 53-bit mantissa boundary.
The current floating-point instruction can specify rounding precision, overriding the
rounding precision specified in FPCR for the duration of the current instruction. For
4-12
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Freescale Semiconductor, Inc...
example, the rounding precision for FADD is determined by FPCR, while the rounding
precision for FSADD is single-precision, independent of FPCR.
Range control helps emulate devices that support only single-precision arithmetic by
rounding the intermediate result’s mantissa to the specified precision and checking that the
intermediate exponent is in the representable range of the selected rounding precision. If
the intermediate result’s exponent exceeds the range, the appropriate underflow or
overflow value is stored as the result in the double-precision format exponent. For
example, if the data format and rounding mode is single-precision RM and the result of an
arithmetic operation overflows the single-precision format, the maximum normalized
single-precision value is stored as a double-precision number in the destination
floating-point data register; that is, the unbiased 11-bit exponent is 0x0FF and the 52-bit
fraction is 0xF_FFFF_E000_0000. If an infinity is the appropriate result for an underflow
or overflow, the infinity value for the destination data format is stored as the result; that is,
the exponent has the maximum value and the mantissa is zero.
Figure 4-12 shows the algorithm for rounding an intermediate result to the selected
rounding precision and destination data format. If the destination is a floating-point
register, the rounding boundary is determined by either the selected rounding precision
specified by FPCR[PREC] or by the instruction itself. For example, FSADD and FDADD
specify single- and double-precision rounding regardless of FPCR[PREC]. If the
destination is memory or an integer data register, the destination data format determines
the rounding boundary. If the rounded result of an operation is inexact, INEX is set in
FPSR[EXC].
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-13
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Entry
Guard, Round
and Sticky Bits = 0
INEX
1
Select Rounding Mode
Freescale Semiconductor, Inc...
Check Intermediate Result
RN
Pos
G and lsb = 1,
R and S = 0
or
G = 1,
R or S = 1
RM
Neg
RP
Pos
G, R,
or S = 1
N
Y
RZ
Neg
G, R,
or S = 1
Y
N
Exact Result
G,R, and S
are chopped
Add 1 to lsb
Add 1 to
lsb
Overflow = 1
Shift mantissa
right 1 bit,
Add 1 to exponent
Guard
Round
Sticky
0
0
0
Exit
Exit
Figure 4-12. Rounding Algorithm Flowchart
The three additional bits beyond the double-precision format, the difference between the
intermediate result’s 56-bit mantissa and the storing result’s 53-bit mantissa, allow the
FPU to perform all calculations as though it were performing calculations using a compute
engine with infinite bit precision. The result is always correct for the specified
destination’s data format before rounding (unless an overflow or underflow error occurs).
The specified rounding produces a number as close as possible to the infinitely precise
4-14
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
intermediate value and still representable in the selected precision. The tie case in
Table 4-6 shows how the 56-bit mantissa allows the FPU to meet the error bound of the
IEEE specification.
Freescale Semiconductor, Inc...
Table 4-6. Tie-Case Example
Result
Integer
52-Bit Fraction
Guard
Round
Sticky
Intermediate
x
xxx…x00
1
0
0
Rounded-to-Nearest
x
xxx…x00
0
0
0
The lsb of the rounded result does not increment even though the guard bit is set in the
intermediate result. The IEEE-754 standard specifies this way of handling ties. If the
destination data format is double-precision and there is a difference between the infinitely
precise intermediate result and the round-to-nearest result, the relative difference is 2–53
(the value of the guard bit). This error is equal to half of the lsb’s value and is the worst
case error that can be introduced with RN mode. Thus, the term one-half unit in the last
place correctly identifies the error bound for this operation. This error specification is the
relative error present in the result; the absolute error bound is equal to 2exponent x 2–53.
Table 4-7 shows the error bound for other rounding modes.
Table 4-7. Round Mode Error Bounds
Result
Integer
52-Bit Fraction
Guard
Round
Sticky
Intermediate
x
xxx…x00
1
1
1
Rounded-to-Zero
x
xxx…x00
0
0
0
The difference between the infinitely precise result and the rounded result is 2–53 + 2–54 +
2–55, which is slightly less than 2–52 (the value of the lsb). Thus, the error bound for this
operation is not more than one unit in the last place. The FPU meets these error bounds for
all arithmetic operations, providing accurate, repeatable results.
4.3.5 Floating-Point Post Processing
Most operations end with post-processing, for which the FPU provides two steps. First,
FPSR[FPCC] bits are set or cleared at the end of each arithmetic or move operation to a
single floating-point data register. FPCC bits are consistently set based on the result of the
operation. Second, the FPU supports 32 conditional tests that allow floating-point
conditional instructions to test floating-point conditions in the same way that integer
conditional instructions test the integer condition code. The combination of consistently
set FPCC bits and the simple programming of conditional instructions gives the processor
a very flexible, efficient way to change program flow based on floating-point results. When
the summary for each instruction is read, it should be assumed that an instruction performs
post processing unless the summary specifically states otherwise. The following
paragraphs describe post processing in detail.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-15
Freescale Semiconductor, Inc.
FPU Programmer’s Model
4.3.5.1 Underflow, Round, Overflow
Freescale Semiconductor, Inc...
During calculation of an arithmetic result, the FPU has more precision and range than the
64-bit double-precision format. However, the final result is a double-precision value. In
some cases, an intermediate result becomes either smaller or larger than can be
represented in double-precision. Also, the operation can generate a larger exponent or
more bits of precision than can be represented in the chosen rounding precision. For these
reasons, every arithmetic instruction ends by checking for underflow, rounding the result
and checking for overflow.
At the completion of an arithmetic operation, the intermediate result is checked to see if it
is too small to be represented as a normalized number in the selected precision. If so, the
underflow (UNFL) bit is set in FPSR[EXC]. If no underflow occurs, the intermediate
result is rounded according to the user-selected rounding precision and mode. After
rounding, the inexact bit (INEX) is set as described in Figure 4-12. Lastly, the magnitude
of the result is checked to see if it exceeds the current rounding precision. If so, the
overflow (OVFL) bit is set and a correctly signed infinity or correctly signed largest
normalized number is returned, depending on the rounding mode.
NOTE:
I N E X c a n a l s o b e s e t b y OV F L , U N F L , a n d w h e n
denormalized numbers are encountered.
4.3.5.2 Conditional Testing
Unlike operation-dependent integer condition codes, an instruction either always sets
FPCC bits in the same way or does not change them at all. Therefore, instruction
descriptions do not include FPCC settings. This section describes how FPCC bits are set.
FPCC bits differ slightly from integer condition codes. An FPU operation’s final result sets
or clears FPCC bits accordingly, independent of the operation itself. Integer condition
codes bits N and Z have this characteristic, but V and C are set differently for different
instructions. Table 4-8 lists FPCC settings for each data type. Loading FPCC with another
combination and executing a conditional instruction can produce an unexpected branch
condition.
Table 4-8. FPCC Encodings
Data Type
4-16
N
Z
I
NAN
+ Normalized or Denormalized
0
0
0
0
– Normalized or Denormalized
1
0
0
0
+0
0
1
0
0
–0
1
1
0
0
+ Infinity
0
0
1
0
– Infinity
1
0
1
0
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Table 4-8. FPCC Encodings (Continued)
Data Type
N
Z
I
NAN
+ NAN
0
0
0
1
– NAN
1
0
0
1
Freescale Semiconductor, Inc...
The inclusion of the NAN data type in the IEEE floating-point number system requires
each conditional test to include FPCC[NAN] in its boolean equation. Because it cannot be
determined whether a NAN is bigger or smaller than an in-range number (that is, it is
unordered), the compare instruction sets FPCC[NAN] when an unordered compare is
attempted. All arithmetic instructions that result in a NAN also set the NAN bit.
Conditional instructions interpret NAN being set as the unordered condition.
The IEEE-754 standard defines the following four conditions:
•
•
•
•
Equal to (EQ)
Greater than (GT)
Less than (LT)
Unordered (UN)
The standard requires only the generation of the condition codes as a result of a
floating-point compare operation. The FPU can test for these conditions and 28 others at
the end of any operation affecting condition codes. For floating-point conditional branch
instructions, the processor logically combines the 4 bits of the FPCC condition codes to
form 32 conditional tests, 16 of which cause an exception if an unordered condition is
present when the conditional test is attempted (IEEE nonaware tests). The other 16 do not
cause an exception (IEEE-aware tests). The set of IEEE nonaware tests is best used in one
of the following cases:
•
•
When porting a program from a system that does not support the IEEE standard to
a conforming system
When generating high-level language code that does not support IEEE
floating-point concepts (that is, the unordered condition).
An unordered condition occurs when one or both of the operands in a floating-point
compare operation is a NAN. The inclusion of the unordered condition in floating-point
branches destroys the familiar trichotomy relationship (greater than, equal, less than) that
exists for integers. For example, the opposite of floating-point branch greater than (FBGT)
is not floating-point branch less than or equal (FBLE). Rather, the opposite condition is
floating-point branch not greater than (FBNGT). If the result of the previous instruction
was unordered, FBNGT is true; whereas, both FBGT and FBLE would be false because
unordered fails both of these tests (and sets BSUN). Compiler code generators should be
particularly careful of the lack of trichotomy in the floating-point branches because it is
common for compilers to invert the sense of conditions.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-17
Freescale Semiconductor, Inc.
FPU Programmer’s Model
When using the IEEE nonaware tests, the user receives a BSUN exception if a branch is
attempted and FPCC[NAN] is set, unless the branch is an FBEQ or an FBNE. If the
BSUN exception is enabled in FPCR, the exception takes a BSUN trap. Therefore, the
IEEE nonaware program is interrupted if an unexpected condition occurs. Users
knowledgeable of the IEEE-754 standard should use IEEE-aware tests in programs that
contain ordered and unordered conditions. Because the ordered or unordered attribute is
explicitly included in the conditional test, EXC[BSUN] is not set when the unordered
condition occurs. Table 4-9 summarizes conditional mnemonics, definitions, equations,
predicates, and whether EXC[BSUN] is set for the 32 floating-point conditional tests. The
equation column lists FPCC bit combinations for each test in the form of an equation.
Condition codes with an overbar indicate cleared bits; all other bits are set.
Freescale Semiconductor, Inc...
Table 4-9. Floating-Point Conditional Tests
Mnemonic
Definition
Equation
Predicate 1
EXC[BSUN] Set
IEEE Nonaware Tests
EQ
Equal
Z
000001
No
NE
Not equal
Z
001110
No
GT
Greater than
NAN | Z | N
010010
Yes
Not greater than
NAN | Z | N
011101
Yes
Greater than or equal
Z | (NAN | N)
010011
Yes
Not greater than or equal
NAN | (N & Z)
011100
Yes
Less than
N & (NAN | Z)
010100
Yes
NLT
Not less than
NAN | (Z | N)
011011
Yes
LE
Less than or equal
Z | (N & NAN)
010101
Yes
Not less than or equal
NAN | (N | Z)
011010
Yes
Greater or less than
NAN | Z
010110
Yes
NGL
Not greater or less than
NAN | Z
011001
Yes
GLE
Greater, less or equal
NAN
010111
Yes
Not greater, less or equal
NAN
011000
Yes
NGT
GE
NGE
LT
NLE
GL
NGLE
IEEE-Aware Tests
4-18
EQ
Equal
Z
000001
No
NE
Not equal
Z
001110
No
OGT
Ordered greater than
NAN | Z | N
000010
No
ULE
Unordered or less or equal
NAN | Z | N
001101
No
OGE
Ordered greater than or equal
Z | (NAN | N)
000011
No
ULT
Unordered or less than
NAN | (N & Z)
001100
No
OLT
Ordered less than
N & (NAN | Z)
000100
No
UGE
Unordered or greater or equal
NAN | (Z | N)
001011
No
OLE
Ordered less than or equal
Z | (N & NAN)
000101
No
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Table 4-9. Floating-Point Conditional Tests (Continued)
Mnemonic
Definition
Equation
Predicate 1
EXC[BSUN] Set
NAN | (N | Z)
001010
No
UGT
Unordered or greater than
OGL
Ordered greater or less than
NAN | Z
000110
No
UEQ
Unordered or equal
NAN | Z
001001
No
OR
Ordered
NAN
000111
No
UN
Unordered
NAN
001000
No
Freescale Semiconductor, Inc...
Miscellaneous Tests
1
F
False
False
000000
No
T
True
True
001111
No
SF
Signaling false
False
010000
Yes
ST
Signaling true
True
011111
Yes
SEQ
Signaling equal
Z
010001
Yes
SNE
Signaling not equal
Z
011110
Yes
This column refers to the value in the instruction’s conditional predicate field that specifies this test.
4.3.6 Floating-Point Exceptions
This section describes floating-point exceptions and how they are handled. Table 4-10 lists
the vector numbers related to floating-point exceptions. If the exception is taken
pre-instruction, the PC contains the address of the next floating-point instruction (nextFP).
If the exception is taken post-instruction, the PC contains the address of the faulting
instruction (fault).
Table 4-10. Floating-Point Exception Vectors
Vector Number
Vector Offset
Program Counter
Assignment
48
0x0C0
Fault
Floating-point branch/set on unordered condition
49
0x0C4
NextFP or Fault
Floating-point inexact result
50
0x0C8
NextFP
Floating-point divide-by-zero
51
0x0CC
NextFP or Fault
Floating-point underflow
52
0x0D0
NextFP or Fault
Floating-point operand error
53
0x0D4
NextFP or Fault
Floating-point overflow
54
0x0D8
NextFP or Fault
Floating-point input NAN
55
0x0DC
NextFP or Fault
Floating-point input denormalized number
In addition to these vectors, attempting to execute a FRESTORE instruction with a
unsupported frame value generates a format error exception (vector 14). See the
FRESTORE instruction in the PRM.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-19
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Attempting to execute an FPU instruction with an undefined or unsupported value in the
6-bit effective address, the 3-bit source/destination specifier, or the 7-bit opmode generates
a line-F emulator exception, vector 11. See Table 4-23.
4.3.7 Floating-Point Arithmetic Exceptions
This section describes floating-point arithmetic exceptions; Table 4-11 lists these
exceptions in order of priority:
Freescale Semiconductor, Inc...
Table 4-11. Exception Priorities
Priority
Exception
1
Branch/set on unordered (BSUN)
2
Input Not-a-Number (INAN)
3
Input denormalized number (IDE)
4
Operand error (OPERR)
5
Overflow (OVFL)
6
Underflow (UNFL)
7
Divide-by-zero (DZ)
8
Inexact (INEX)
Most floating-point exceptions are taken when the next floating-point arithmetic instruction
is encountered (this is called a pre-instruction exception). Exceptions set during a
floating-point store to memory or to an integer register are taken immediately
(post-instruction exception).
Note that FMOVE is considered an arithmetic instruction because the result is rounded.
Only FMOVE with any destination other than a floating-point register (sometimes called
FMOVE OUT) can generate post-instruction exceptions. Post-instruction exceptions never
write the destination. After a post-instruction exception, processing continues with the next
instruction.
A floating-point arithmetic exception becomes pending when the result of a floating-point
instruction sets an FPSR[EXC] bit and the corresponding FPCR[ENABLE] bit is set. A
user write to the FPSR or FPCR that causes the setting of an exception bit in FPSR[EXC]
along with its corresponding exception enabled in FPCR, leaves the FPU in an
exception-pending state. The corresponding exception is taken at the start of the next
arithmetic instruction as a pre-instruction exception.
Executing a single instruction can generate multiple exceptions. When multiple exceptions
occur with exceptions enabled for more than one exception class, the highest priority
exception is reported and taken. It is up to the exception handler to check for multiple
exceptions. The following multiple exceptions are possible:
4-20
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
•
Operand error (OPERR) and inexact result (INEX)
•
Overflow (OVFL) and inexact result (INEX)
•
Underflow (UNFL) and inexact result (INEX)
•
Divide-by-zero (DZ) and inexact result (INEX)
•
Input denormalized number (IDE) and inexact result (INEX)
•
Input not-a-number (INAN) and input denormalized number (IDE)
Freescale Semiconductor, Inc...
In general, all exceptions behave similarly. If the exception is disabled when the exception
condition exists, no exception is taken, a default result is written to the destination (except
for BSUN exception, which has no destination), and execution proceeds normally.
If an enabled exception occurs, the same default result above is written for pre-instruction
exceptions but no result is written for post-instruction exceptions.
An exception handler is expected to execute FSAVE as its first floating-point instruction.
This also clears FPCR, which keeps exceptions from occurring during the handler. Because
the destination is overwritten for floating-point register destinations, the original
floating-point destination register value is available for the handler on the FSAVE state
frame. The address of the instruction that caused the exception is available in the FPIAR.
When the handler is done, it should clear the appropriate FPSR exception bit on the FSAVE
state frame, then execute FRESTORE. If the exception status bit is not cleared on the state
frame, the same exception occurs again.
Alternatively, instead of executing FSAVE, an exception handler could simply clear
appropriate FPSR exception bits, optionally alter FPCR, and then return from the
exception. Note that exceptions are never taken on FMOVE to or from the status and control
registers and FMOVEM to or from the floating-point data registers.
At the completion of the exception handler, the RTE instruction must be executed to return
to normal instruction flow.
4.3.8 Branch/Set on Unordered (BSUN)
A BSUN results from performing an IEEE nonaware conditional test associated with the
FBcc instruction when an unordered condition is present. Any pending floating-point
exception is first handled by a pre-instruction exception, after which the conditional
instruction restarts. The conditional predicate is evaluated and checked for a BSUN
exception before executing the conditional instruction. A BSUN exception occurs if the
conditional predicate is an IEEE non-aware branch and FPCC[NAN] is set. When this
condition is detected, FPSR[BSUN] is set.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-21
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Table 4-12. BSUN Exception Enabled/Disabled Results
Freescale Semiconductor, Inc...
Condition BSUN
Description
Exception
disabled
0
The floating-point condition is evaluated as if it were the equivalent IEEE-aware conditional
predicate. No exceptions are taken.
Exception
Enabled
1
The processor takes a floating-point pre-instruction exception.
The BSUN exception is unique in that the exception is taken before the conditional predicate is
evaluated. If the user BSUN exception handler fails to update the PC to the instruction after the
excepting instruction when returning, the exception executes again. Any of the following actions
prevent taking the exception again:
• Clearing FPSR[NAN]
• Disabling FPCR[BSUN]
• Incrementing the stored PC in the stack bypasses the conditional instruction. This applies to
situations where fall-through is desired. Note that to accurately calculate the PC increment
requires knowledge of the size of the bypassed conditional instruction.
4.3.9 Input Not-A-Number (INAN)
The INAN exception is a mechanism for handling a user-defined, non-IEEE data type. If
either input operand is a NAN, FPSR[INAN] is set. By enabling this exception, the user can
override the default action taken for NAN operands. Because FMOVEM, FMOVE FPCR,
and FSAVE instructions do not modify status bits, they cannot generate exceptions.
Therefore, these instructions are useful for manipulating INANs. See Table 4-13.
Table 4-13. INAN Exception Enabled/Disabled Results
Condition
INAN
Description
Exception
disabled
0
If the destination data format is single- or double-precision, a NAN is generated with a mantissa
of all ones and a sign of zero transferred to the destination. If the destination data format is B, W,
or L, a constant of all ones is written to the destination.
Exception
enabled
1
The result written to the destination is the same as the exception disabled case unless the
exception occurs on a FMOVE OUT, in which case the destination is unaffected.
4.3.10 Input Denormalized Number (IDE)
The input denorm bit, FPCR[IDE], provides software support for denormalized operands.
When the IDE exception is disabled, the operand is treated as zero, FPSR[INEX] is set, and
the operation proceeds. When the IDE exception is enabled and an operand is
denormalized, an IDE exception is taken but FPSR[INEX] is not set to allow the handler to
set it appropriately. See Table 4-14.
Note that the FPU never generates denormalized numbers. If necessary, software can create
them in the underflow exception handler.
4-22
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Table 4-14. IDE Exception Enabled/Disabled Results
Condition
IDE
Description
Exception
disabled
0
Any denormalized operand is treated as zero, FPSR[INEX] is set, and the operation proceeds.
Exception
enabled
1
The result written to the destination is the same as the exception disabled case unless the
exception occurs on a FMOVE OUT, in which case the destination is unaffected. FPSR[INEX] is
not set to allow the handler to set it appropriately.
Freescale Semiconductor, Inc...
4.3.11 Operand Error (OPERR)
The operand error exception encompasses problems arising in a variety of operations,
including errors too infrequent or trivial to merit a specific exceptional condition. Basically,
an operand error occurs when an operation has no mathematical interpretation for the given
operands. Table 4-15 lists possible operand errors. When one occurs, FPSR[OPERR] is set.
Table 4-15. Possible Operand Errors
Instruction
Condition Causing Operand Error
FADD
[(+∞) + (-∞)] or [(-∞) + (+∞)]
FDIV
(0 ÷ 0) or (∞ ÷ ∞)
FMOVE OUT (to B, W, or L)
Integer overflow, source is NAN or ±∞
FMUL
One operand is 0 and the other is ±∞
FSQRT
Source is < 0 or -∞
FSUB
[(+∞) - (+∞)] or [(-∞) - (-∞)]
Table 4-16 describes results when the exception is enabled and disabled.
Table 4-16. OPERR Exception Enabled/Disabled Results
Condition OPERR
Description
Exception
disabled
0
When the destination is a floating-point data register, the result is a double-precision NAN,
with its mantissa set to all ones and the sign set to zero (positive).
For a FMOVE OUT instruction with the format S or D, an OPERR exception is impossible. With
the format B, W, or L, an OPERR exception is possible only on a conversion to integer
overflow, or if the source is either an infinity or a NAN. On integer overflow and infinity source
cases, the largest positive or negative integer that can fit in the specified destination size (B,
W, or L) is stored. In the NAN source case, a constant of all ones is written to the destination.
Exception
enabled
1
The result written to the destination is the same as for the exception disabled case unless the
exception occurred on a FMOVE OUT, in which case the destination is unaffected. If desired,
the user OPERR handler can overwrite the default result.
4.3.12 Overflow (OVFL)
An overflow exception is detected for arithmetic operations in which the destination is a
floating-point data register or memory when the intermediate result’s exponent is greater
than or equal to the maximum exponent value of the selected rounding precision. Overflow
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-23
Freescale Semiconductor, Inc.
FPU Programmer’s Model
occurs only when the destination is S- or D-precision format; overflows for other formats
are handled as operand errors. At the end of any operation that could potentially overflow,
the intermediate result is checked for underflow, rounded, and then checked for overflow
before it is stored to the destination. If overflow occurs, FPSR[OVFL,INEX] are set.
Even if the intermediate result is small enough to be represented as a double-precision
number, an overflow can occur if the magnitude of the intermediate result exceeds the range
of the selected rounding precision format. See Table 4-17.
Table 4-17. OVFL Exception Enabled/Disabled Results
Freescale Semiconductor, Inc...
Condition OVFL
Description
Exception
disabled
0
The values stored in the destination based on the rounding mode defined in FPCR[MODE].
RN Infinity, with the sign of the intermediate result.
RZ Largest magnitude number, with the sign of the intermediate result.
RM For positive overflow, largest positive normalized number
For negative overflow, -∞.
RP For positive overflow, +∞
For negative overflow, largest negative normalized number.
Exception
enabled
1
The result written to the destination is the same as for the exception disabled case unless the
exception occurred on a FMOVE OUT, in which case the destination is unaffected. If desired, the
user OVFL handler can overwrite the default result.
4.3.13 Underflow (UNFL)
An underflow exception occurs when the intermediate result of an arithmetic instruction is
too small to be represented as a normalized number in a floating-point register or memory
using the selected rounding precision, that is, when the intermediate result exponent is less
than or equal to the minimum exponent value of the selected rounding precision. Underflow
can only occur when the destination format is single or double precision. When the
destination is byte, word, or longword, the conversion underflows to zero without causing
an underflow or an operand error. At the end of any operation that could underflow, the
intermediate result is checked for underflow, rounded, and checked for overflow before it
is stored in the destination. FPSR[UNFL] is set if underflow occurs. If the underflow
exception is disabled, FPSR[INEX] is also set.
Even if the intermediate result is large enough to be represented as a double-precision
number, an underflow can occur if the magnitude of the intermediate result is too small to
be represented in the selected rounding precision. Table 4-18 shows results when the
exception is enabled or disabled.
4-24
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Table 4-18. UNFL Exception Enabled/Disabled Results
Freescale Semiconductor, Inc...
Condition UNFL
Description
Exception
disabled
0
The stored result is defined below. The UNFL exception also sets FPSR[INEX] if the UNFL
exception is disabled.
RN Zero, with the sign of the intermediate result.
RZ Zero, with the sign of the intermediate result.
RM For positive underflow, + 0
For negative underflow, smallest negative normalized number.
RP For positive underflow, smallest positive normalized number
For negative underflow, - 0
Exception
enabled
1
The result written to the destination is the same as for the exception disabled case unless the
exception occurs on a FMOVE OUT, in which case the destination is unaffected. If desired, the
user UNFL handler can overwrite the default result. The UNFL exception does not set
FPSR[INEX] if the UNFL exception is enabled so the exception handler can set FPSR[INEX]
based on results it generates.
4.3.14 Divide-by-Zero (DZ)
Attempting to use a zero divisor for a divide instruction causes a divide-by-zero exception.
When a divide-by-zero is detected, FPSR[DZ] is set. Table 4-18 shows results when the
exception is enabled or disabled.
Table 4-19. DZ Exception Enabled/Disabled Results
Condition DZ
Description
Exception
disabled
0
The destination floating-point data register is written with infinity with the sign set to the exclusive
OR of the signs of the input operands.
Exception
enabled
1
The destination floating-point data register is written as in the exception is disabled case.
4.3.15 Inexact Result (INEX)
An INEX exception condition exists when the infinitely precise mantissa of a floating-point
intermediate result has more significant bits than can be represented exactly in the selected
rounding precision or in the destination format. If this condition occurs, FPSR[INEX] is set
and the infinitely-precise result is rounded according to Table 4-20.
Table 4-20. Inexact Rounding Mode Values
Mode
Result
RN
The representable value nearest the infinitely-precise intermediate value is the result. If the two nearest
representable values are equally near, the one whose lsb is 0 (even) is the result. This is sometimes called
round-to-nearest-even.
RZ
The result is the value closest to and no greater in magnitude than the infinitely-precise intermediate result.
This is sometimes called chop-mode, because the effect is to clear bits to the right of the rounding point.
RM
The result is the value closest to and no greater than the infinitely-precise intermediate result (possibly -∞).
RP
The result is the value closest to and no less than the infinitely-precise intermediate result (possibly +∞).
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-25
Freescale Semiconductor, Inc.
FPU Programmer’s Model
FPSR[INEX] is also set for any of the following conditions:
•
If an input operand is a denormalized number and the IDE exception is disabled
•
An overflowed result
•
An underflowed result with the underflow exception disabled
Table 4-18 shows results when the exception is enabled or disabled.
Table 4-21. INEX Exception Enabled/Disabled Results
Freescale Semiconductor, Inc...
Condition INEX
Description
Exception
disabled
0
The result is rounded and then written to the destination.
Exception
enabled
1
The result written to the destination is the same as for the exception disabled case unless the
exception occurred on a FMOVE OUT, in which case the destination is unaffected. If desired, the
user INEX handler can overwrite the default result.
4.3.16 Floating-Point State Frames
Floating-point arithmetic exception handlers should have FSAVE as the first floating-point
instruction; otherwise, encountering another floating-point arithmetic instruction will
cause the exception to be reported again. After FSAVE executes, the handler should use
FMOVEM to access floating-point data registers because it cannot generate further
exceptions or change the FPSR.
Note that if no intervention is needed, instead of FSAVE, the handler can simply clear the
appropriate FPCR and FPSR bits and then return from the exception.
Because the FPCR and FPSR are written in the FSAVE frame, a context switch needs only
execute FSAVE and FMOVEM for data registers. The new process needs to load data
registers by using a FMOVEM/FRESTORE sequence, then it can continue.
FSAVE operations always write a 4-longword floating-point state frame that holds a 64-bit
exception operand. Figure 4-13 shows FSAVE frame contents.
31
24 23
19 18
16
15
Format word
Frame Format
0
Control Register (FPCR)
0000_0
Vector
Exception operand upper 32 bits
Exception operand lower 32 bits
Status register (FPSR)
Figure 4-13. Floating-Point State Frame Contents
Table 4-22 describes format word fields.
4-26
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
FPU Programmer’s Model
Table 4-22. Format Word Field Descriptions
Bits
Name
Description
Freescale Semiconductor, Inc...
31–24 Frame format Defines the format of the frame.
0x00 Null Frame (NULL)
0x05 Idle Frame (IDLE)
0xE5 Exception Frame (EXCP)
23–19
—
18–16
Vector
Zeros
Exception vector
000 BSUN
001 INEX
010 DZ
011 UNFL
100 OPERR
101 OVFL
110 INAN
111 IDE
When FSAVE executes, the floating-point frame reflects the FPU state at the time of the
FSAVE. Internally, the FPU can be in the NULL, IDLE, or EXCP states. Upon reset, the
FPU is in NULL state, in which all floating-point registers contain NANs and the FPCR,
FPSR, and FPIAR contain zeros. The FPU remains in NULL state until execution of an
implemented floating-point instruction (except FSAVE). At this point, the FPU transitions
from NULL to an IDLE state. A FRESTORE of NULL returns the FPU to NULL state.
EXCP state is entered as a result of a floating-point exception or an unsupported data type
exception. The vector field identifies exception types associated with the EXCP state. This
field and the exception vector taken are determined directly from the exception control
(FPCR) and status (FPSR) bits. An FSAVE instruction always clears FPCR after saving its
state. Thus, after an FSAVE, a handler does not generate further floating-point exceptions
unless the handler re-enables the exceptions. FRESTORE returns FPCR and FPSR to their
previous state before entering the handler, as stored in the state frame. A handler could
alter the state frame to restore the FPU (using FRESTORE) into a different state than that
saved by using FSAVE.
Normally, an exception handler executes FSAVE, processes the exception, clears the
exception bit in the FSAVE state frame status word, and executes FRESTORE. If
appropriate exception bits set in the status word are not cleared, the same exception is
taken again. If multiple exception bits are set in the status word, each should be processed,
cleared, and restored by their respective handlers. In this way, all exceptions are processed
in priority order.
If it is not necessary to handle multiple exceptions, the exception model can be simplified
(after any processing) by the handler manually loading FPCR and FPSR and then
discarding the state frame before executing an RTE. Given that state frames are 4
longwords, it may be quicker to discard the state frame by incrementing the address
pointer (often the system stack pointer, A7) by 16.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-27
Freescale Semiconductor, Inc.
Instructions
The exception operand, contained in longwords 2 and 3 of the FSAVE frame, is always the
value of the destination operand before the operation which caused the exception
commenced. Thus, for dyadic register-to-register operations, the exception operand
contains the value of the destination register before it was overwritten by the operation
which caused the exception. This operand can be retrieved by an exception handler that
needs both original operands in order to process the exception.
4.4 Instructions
Freescale Semiconductor, Inc...
This section includes an instruction set summary, execution times, and differences
between ColdFire and 68K FPU programming models. For detailed instruction
descriptions, see the PRM.
4.4.1 Floating-Point Instruction Overview
ColdFire instructions are 16, 32, or 48 bits long. The general definition of a floating-point
operation and effective addressing mode require 32 bits; some addressing modes require
another 16-bit extension word. Table 4-23 shows the minimum size instruction formats.
The first word is the opword; the second is extension word 1.
Table 4-23. Floating-Point Instruction Formats
Mnemonic
Instruction Code
FABS
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG
OPMODE
FADD
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG
OPMODE
FBcc
1 1 1 1 0 0 1 0 1 SZ COND PREDICATE
16b displacement or MS Word of 32b
LS Word of 32b Displacement
FCMP
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG 0 1 1 1 0 0 0
FDIV
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG
FINT
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG 0 0 0 0 0 0 1
FINTRZ
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG 0 0 0 0 0 1 1
FMOVE
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0
1
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
1
0 dr REG SEL 0 0
FMOVEM
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
1
1 dr 1 0
FMUL
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG
OPMODE
FNEG
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG
OPMODE
FNOP
1 1 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0
0
OPMODE
OPMODE
1 DEST FMT SRC REG 0 0 0 0 0 0 0
0 0 0
0 0 0 0 0 0 0 0
0 0 0
0 0 0
REGISTER LIST
0 0 0 0 0 0 0 0
FRESTORE 1 1 1 1 0 0 1 1 0 1 EA MODE EA REG
FSAVE
1 1 1 1 0 0 1 1 0 0 EA MODE EA REG
FSQRT
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
4-28
0 R/M 0 SRC SPEC DEST REG
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
OPMODE
Freescale Semiconductor, Inc.
Instructions
Table 4-23. Floating-Point Instruction Formats (Continued)
Mnemonic
Instruction Code
FSUB
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG
FTST
1 1 1 1 0 0 1 0 0 0 EA MODE EA REG
0 R/M 0 SRC SPEC DEST REG 0 1 1 1 0 1 0
OPMODE
Table 4-24 defines the terminology used in Table 4-23.
Table 4-24. Instruction Format Terminology
Freescale Semiconductor, Inc...
Term
Definition
Instructions
Instructions appear in memory as sequential, 16-bit values, and are read in the above table left to
right. An instruction can have from 1 to 3 16-bit words. A shaded block indicates this word is never
used and is not present.
EA MODE
EA REG
Defines the effective address for an operand located external to the FPU. For most FPU instructions,
this field defines the location of an external source operand; for FP store operations, it specifies the
destination location.
R/M
If R/M = 0, an FPU data register is one source operand, otherwise the source operand is specified by
the EA {MODE, REG} fields.
SRC SPEC
Defines the format (byte, word, longword, single-, or double-precision) of an external operand.
DEST REG
Specifies the destination FPU data register.
COND
Defines the condition to be evaluated (EQ, NE, and so on) during the execution of the FPU
PREDICATE conditional branch instruction.
OPMODE
Defines the exact operation to be performed by the FPU.
SZ
Defines the length of the PC-relative displacement for the FPU conditional branch instruction. If SZ =
0, the displacement is 16 bits, otherwise a 32-bit displacement is used.
dr
Specifies direction of the MOVE transfer. As a 0, it moves from memory to the FP; as 1, it moves from
the FP to memory.
REGISTER
LIST
REG SEL
Defines FPU data registers to be moved during the execution of the FMOVEM instruction.
Indicates the FPU control register to be moved during execution of an FMOVE control register
instruction.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-29
Freescale Semiconductor, Inc.
Instructions
4.4.2 Floating-Point Instruction Execution Times
Table 4-25 shows the ColdFire execution times for the floating-point instructions in terms
of processor core clock cycles. Each timing entry is presented as C(r/w).
•
Freescale Semiconductor, Inc...
•
•
C = The number of processor clock cycles including all applicable operand reads
and writes plus all internal core cycles required to complete instruction execution
r = The number of operand reads
w = The number of operand writes
NOTE:
Timing assumptions are the same as those for the ColdFire
ISA. See the ColdFire Microprocessor Family Programmer’s
Reference Manual.
Table 4-25. Floating-Point Instruction Execution Times1,
2, 3
Effective Address <ea>
Opcode
Format
FPn
Dn
(An)
(An)+
-(An)
(d16,An)
(d16,PC)
FABS
<ea>y,FPx
1(0/0)
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
FADD
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
FBcc
<label>
—
—
—
—
—
—
2(0/0) if correct,
9(0/0) if incorrect
FCMP
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
FDIV
<ea>y,FPx
23(0/0)
23(0/0)
23(1/0)
23(1/0)
23(1/0)
23(1/0)
23(1/0)
FINT
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
FINTRZ
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
FMOVE
<ea>y,FPx
1(0/0)
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
FPy,<ea>x
—
2(0/1)
2(0/1)
2(0/1)
2(0/1)
2(0/1)
—
<ea>y,FP*R
—
6(0/0)
6(1/0)
6(1/0)
6(1/0)
6(1/0)
6(1/0)
FP*R,<ea>x
—
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
—
<ea>y,#list
—
—
2n(2n/0)
—
—
2n(2n/0)
2n(2n/0)
#list,<ea>x
—
—
1+2n(0/2n)
—
—
1+2n(0/2n)
—
FMUL
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
FNEG
<ea>y,FPx
1(0/0)
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
—
—
—
—
—
—
2(0/0)
FMOVEM
4
FNOP
FRESTORE
<ea>y
—
—
6(4/0)
—
—
6(4/0)
6(4/0)
FSAVE
<ea>x
—
—
7(0/4)
—
—
7(0/4)
—
FSQRT
<ea>y,FPx
56(0/0)
56(0/0)
56(1/0)
56(1/0)
56(1/0)
56(1/0)
56(1/0)
FSUB
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
FTST
<ea>y,FPx
1(0/0)
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
4-30
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instructions
1
Add 1(1/0) for an external read operand of double-precision format for all instructions except FMOVEM, and
1(0/1) for FMOVE FPy,<ea>x when the destination is double-precision.
2 If the external operand is an integer format (byte, word, longword), there is a 4 cycle conversion time which
must be added to the basic execution time.
3 If any exceptions are enabled, the execution time for FMOVE FPy,<ea>x increases by one cycle. If the BSUN
exception is enabled, the execution time for FBcc increases by one cycle.
4 For FMOVEM, n refers to the number of registers being moved.
Freescale Semiconductor, Inc...
The ColdFire architecture supports concurrent execution of integer and floating-point
instructions. The latencies in this table define the execution time needed by the FPU. After
a multi-cycle FPU instruction is issued, subsequent integer instructions can execute
concurrently with the FPU execution. For this sequence, the floating-point instruction
occupies only one OEP cycle.
4.4.3 Key Differences between ColdFire and MC680x0 FPU
Programming Models
This section is intended for compiler developers and developers porting assembly
language routines from 68K to ColdFire. It highlights major differences between the
ColdFire FPU instruction set architecture (ISA) and the equivalent 68K family ISA, using
the MC68060 as the reference. The internal FPU datapath width is the most obvious
difference. ColdFire uses 64-bit double-precision and the 68K Family uses 80-bit
extended precision. Other differences pertain to supported addressing modes, both across
all FPU instructions as well as specific opcodes. Table 4-26 lists key differences. Because
all ColdFire implementations support instruction sizes of 48 bits or less, 68K operations
requiring larger instruction lengths cannot be supported.
.
Table 4-26. Key Programming Model Differences
Feature
68K
ColdFire
80 bits
64 bits
Support for fpGEN d8(An,Xi),FPx
Yes
No
Support for fpGEN xxx.{w,l},FPx
Yes
No
Support for fpGEN d8(PC,Xi),FPx
Yes
No
Support for fpGEN #xxx,FPx
Yes
No
Support for fmovem (Ay)+,#list
Yes
No
Support for fmovem #list,-(Ax)
Yes
No
Support for fmovem FP Control Registers
Yes
No
Internal datapath width
Some differences affect function activation and return. 68K subroutines typically began
with FMOVEM #list,-(a7) to save registers on the system stack, with each register
occupying 3 longwords. In ColdFire, each register occupies 2 longwords and the stack
pointer must be adjusted before the FMOVEM instruction. A similar sequence generally
occurs at the end of the function, preparing to return control to the calling routine.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-31
Freescale Semiconductor, Inc.
Instructions
The examples in Table 4-27, Table 4-28, and Table 4-29 show a 68K operation and the
equivalent ColdFire sequence.
Table 4-27. 68K/ColdFire Operation Sequence 11
68K
fmovem.x #list,-(a7)
lea -8*n(a7),a7;allocate stack space
fmovem.d #list,(a7) ;save FPU registers
fmovem.x (a7)+,#list
fmovem.d (a7),#list ;restore FPU registers
lea 8*n(a7),a7 ;deallocate stack space
1
Freescale Semiconductor, Inc...
ColdFire Equivalent
n is the number of FP registers to be saved/restored.
If the subroutine includes LINK and UNLK instructions, the stack space needed for FPU
register storage can be factored into these operations and LEA instructions are not
required.
The 68K FPU supports loads and stores of multiple control registers (FPCR, FPSR, and
FPIAR) with one instruction. For ColdFire, only one can be moved at a time.
For instructions that require an unsupported addressing mode, the operand address can be
formed with a LEA instruction immediately before the FPU operation. See Table 4-28.
Table 4-28. 68K/ColdFire Operation Sequence 2
68K
ColdFire Equivalent
fadd.s label,fp2
lea label,a0;form pointer to data
fadd.s (a0),fp2
fmul.d (d8,a1,d7),fp5
lea (d8,a1,d7),a0;form pointer to data
fmul.d (a0),fp5
fcmp.l (d8,pc,d2),fp3
lea (d8,pc,d2),a0;form pointer to data
fcmp.l (a0),fp3
The 68K FPU allows floating-point instructions to directly specify immediate values; the
ColdFire FPU does not support these types of immediate constants. It is recommended
that floating-point immediate values be moved into a table of constants that can be
referenced using PC-relative addressing or as an offset from another address pointer. See
Table 4-29.
Table 4-29. 68K/ColdFire Operation Sequence 3
68K
ColdFire Equivalent
fadd.l #imm1,fp3
fadd.l (imm1_label,pc),fp3
fsub.s #imm2,fp4
fsub.s (imm2_label,pc),fp3
4-32
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instructions
Table 4-29. 68K/ColdFire Operation Sequence 3 (Continued)
68K
Freescale Semiconductor, Inc...
fdiv.d #imm3,fp5
ColdFire Equivalent
fdiv.d (imm3_label,pc),fp3
align 4
imm1_label:
long imm1 ;integer longword
imm2_label:
long imm2 ;single-precision
imm3_label:
long imm3_upper,
imm3_lower ;double-precision
Finally, ColdFire and the 68K differ in how exceptions are made pending. In the ColdFire
exception model, asserting both an FPSR exception indicator bit and the corresponding
FPCR enable bit makes an exception pending. Thus, a pending exception state can be
created by loading FPSR and/or FPCR. On the 68K, this type of pending exception is not
possible.
Analysis of compiled floating-point applications indicates these differences account for
most of the changes between 68K-compatible text and the equivalent ColdFire program.
Chapter 4. Floating-Point Unit (FPU)
For More Information On This Product,
Go to: www.freescale.com
4-33
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Instructions
4-34
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 5
Enhanced Multiply-Accumulate Unit
(EMAC)
This chapter describes the functionality, microarchitecture, and performance of the
enhanced multiply-accumulate (EMAC) unit in the ColdFire family of processors.
5.1 Multiply-Accumulate Unit
This document details the functionality, microarchitecture, and performance of the
hardware multiply-accumulate (MAC) unit in the ColdFire family of processors.
Motorola has incorporated a RISC-based processor design for peak performance and a
simplified version of the M68000 Family variable-length instruction set for maximum code
density. The result is a family of 32-bit microprocessors optimized for embedded
applications requiring high performance in a small core size.
The ColdFire performance road map defines a series of microarchitecture versions that
couple with improved process technology to offer increasing levels of performance.
The MAC design centers on the notion of providing a limited set of DSP operations
currently used in embedded code, while supporting the integer multiply instructions of the
baseline ColdFire architecture.
The MAC provides functionality in three related areas:
•
•
•
Signed and unsigned integer multiplies
Multiply-accumulate operations supporting signed and unsigned integer operands as
well as signed, fixed-point, fractional operands
Miscellaneous register operations
The ColdFire family supports two MAC implementations with different performance levels
and capabilities for differing silicon costs. The original MAC is a three-stage execution
pipeline, optimized for 16-bit operands and featuring a 16x16 multiply array with a single
32-bit accumulator. The enhanced MAC (EMAC) features a four-stage pipeline optimized
for 32-bit operands, with a fully pipelined 32x32 multiply array and four 48-bit
accumulators. Either can be attached to any version (V2, V3, or V4) ColdFire core as
dictated by application requirements.
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-1
Freescale Semiconductor, Inc.
An Introduction to the MAC
The first ColdFire MAC supported signed and unsigned integer operands and was
optimized for 16x16 operations, based on a variety of target applications including servo
control and image compression. As ColdFire-based systems proliferated, the desire for
more precision on input operands increased. The result was an improved ColdFire MAC
with user-programmable control to optionally enable use of fractional input operands.
EMAC improvements target three primary areas:
•
•
Freescale Semiconductor, Inc...
•
Improved performance of 32x32 multiply operations.
Addition of three more accumulators to minimize MAC pipeline stalls caused by
exchanges between the accumulator and the pipeline’s general-purpose registers.
A 48-bit accumulation data path to allow the use of a 40-bit product plus the addition
of 8 extension bits to increase the dynamic number range when implementing signal
processing algorithms.
The three areas of functionality are addressed in detail in following sections. The logic
required to support this functionality is contained in a MAC module, as shown in
Figure 5-1.
Operand Y
Operand X
X
Shift 0,1,-1
+/-
Accumulator(s)
Figure 5-1. Multiply-Accumulate Functionality Diagram
5.2 An Introduction to the MAC
The MAC is an extension of the basic multiplier found in most microprocessors. It is
implemented either in hardware or in an iterative routine within an architecture and
supports signal processing algorithms in an acceptable number of cycles, given application
constraints. For example, small digital filters can tolerate some variance in an algorithm’s
execution time, but larger, more complicated algorithms such as orthogonal transforms may
have more demanding speed requirements beyond the scope of any processor architecture
and may require full DSP implementation.
The 68K/ColdFire architecture was not designed for high-speed signal processing, and a
large DSP engine would be excessive in an embedded environment. In striking a balance
between speed, size, and functionality, the ColdFire MAC is optimized for a small set of
operations that involve multiplication and cumulative additions. Specifically, the multiplier
5-2
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
General Operation
array is optimized for single-cycle pipelined operations with a possible accumulation after
product generation. This functionality is common in many signal processing applications.
The ColdFire core architecture also has been modified to allow an operand to be fetched in
parallel with a multiply, increasing overall performance for certain DSP operations.
Consider a typical filtering operation where the filter is defined as in Figure 5-2.
y(i) =
N–1
N–1
k=1
k=0
∑ a ( k )y ( i – k ) + ∑ b ( k )x ( i – k )
Freescale Semiconductor, Inc...
Figure 5-2. Infinite Impulse Response (IIR) Filter
Here, the output y(i) is determined by past output values or past input values. This is the
general form of an infinite impulse response (IIR) filter. A finite impulse response (FIR)
filter can be obtained by setting coefficients a(k) to zero. In either case, the operations
involved in computing such a filter are multiplies and product summing. To show this point,
reduce the above equation to a simple, four-tap FIR filter, shown in Figure 5-3, in which the
accumulated sum is a sum of past data values and coefficients.
3
y(i) =
∑ b ( k )x ( i – k )
= b ( 0 )x ( i ) + b ( 1 )x ( i – 1 ) + b ( 2 )x ( i – 2 ) + b ( 3 )x ( i – 3 )
k=0
Figure 5-3. Four-Tap FIR Filter
5.3 General Operation
The MAC speeds execution of ColdFire integer multiply instructions (MULS and MULU)
and provides additional functionality for multiply-accumulate operations. By executing
MULS and MULU in the MAC, execution times are minimized and deterministic
compared to the 2-bit/cycle algorithm with early termination that the OEP normally uses if
no MAC hardware is present.
The added MAC instructions to the ColdFire ISA provide for the multiplication of two
numbers, followed by the addition or subtraction of the product to/from the value in an
accumulator. Optionally, the product may be shifted left or right by 1 bit before addition or
subtraction. Hardware support for saturation arithmetic can be enabled to minimize
software overhead when dealing with potential overflow conditions. Multiply-accumulate
operations support 16- or 32-bit input operands of the following formats:
•
•
•
Signed integers
Unsigned integers
Signed, fixed-point, fractional numbers
The EMAC is optimized for single-cycle, pipelined 32x32 multiplications. For word- and
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-3
Freescale Semiconductor, Inc.
General Operation
longword-sized integer input operands, the low-order 40 bits of the product are formed and
used with the destination accumulator. For fractional operands, the entire 64-bit product is
calculated and either truncated or rounded to the most-significant 40-bit result using the
round-to-nearest (even) method before it is combined with the destination accumulator.
For all operations, the resulting 40-bit product is extended to a 48-bit value (using
sign-extension for signed integer and fractional operands, zero-fill for unsigned integer
operands) before being combined with the 48-bit destination accumulator.
Freescale Semiconductor, Inc...
Figure 5-4 and Figure 5-5 show relative alignment of input operands, the full 64-bit
product, the resulting 40-bit product used for accumulation, and 48-bit accumulator
formats.
X
Product
Extended Product
OperandY
32
OperandX
32
40
23
8
“0”
40
+
Accumulator
8
Extension Byte Upper [7:0]
8
40
Accumulator [31:0]
Figure 5-4. Fractional Alignment
5-4
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Extension Byte Lower [7:0]
Freescale Semiconductor, Inc.
General Operation
X
Product
Extended Product
OperandY
32
OperandX
32
8
32
8
8
32
8
8
32
24
Freescale Semiconductor, Inc...
+
Accumulator
Extension Byte Upper [7:0]
Accumulator [31:0]
Extension Byte Lower [7:0]
Figure 5-5. Signed and Unsigned Integer Alignment
Thus, the 48-bit accumulator definition is a function of the EMAC operating mode. Given
that each 48-bit accumulator is the concatenation of 16-bit ACCextx contents and 32-bit
ACCx contents, the specific definitions are as follows:
if MACSR[6:5] ==
Complete
if MACSR[6:5] ==
Complete
if MACSR[6:5] ==
Complete
00/* signed integer mode */
Accumulator[47:0] = {ACCextx[15:0], ACCx[31:0]}
-1/* signed fractional mode */
Accumulator [47:0] = {ACCextx[15:8], ACCx[31:0], ACCextx[7:0]}
10/* unsigned integer mode */
Accumulator[47:0] = {ACCextx[15:0], ACCx[31:0]}
The four accumulators are represented as an array, ACCx, where x selects the register.
Although the multiplier array is implemented in a four-stage pipeline, all arithmetic MAC
instructions have an effective issue rate of 1 cycle, regardless of input operand size or type.
All arithmetic operations use register-based input operands, and summed values are stored
internally in an accumulator. Thus, an additional move instruction is needed to store data in
a general-purpose register. One feature new to MAC instructions is the ability to choose the
upper or lower word of a register as a 16-bit input operand. This is useful in filtering
operations if one data register is loaded with the input data and another is loaded with the
coefficient. Two 16-bit multiply accumulates can be performed without fetching additional
operands between instructions by alternating the word choice during the calculations.
The EMAC has four accumulator registers versus the MAC’s single accumulator. The
additional registers improve the performance of some algorithms by minimizing pipeline
stalls needed to store an accumulator value back to general-purpose registers. Many
algorithms require multiple calculations on a given data set. By applying different
accumulators to these calculations, it is often possible to store one accumulator without any
stalls while performing operations involving a different destination accumulator.
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-5
Freescale Semiconductor, Inc.
Memory Map/Register Set
The need to move large amounts of data presents an obstacle to obtaining high throughput
rates in DSP engines. New and existing ColdFire instructions can accommodate these
requirements. A MOVEM instruction can move large blocks of data efficiently by
generating line-sized burst references. The ability to simultaneously load an operand from
memory into a register and execute a MAC instruction makes some DSP operations such
as filtering and convolution more manageable.
Freescale Semiconductor, Inc...
The programming model includes a 16-bit mask register (MASK), which can optionally be
used to generate an operand address during MAC + MOVE instructions. The application of
this register with auto-increment addressing mode supports efficient implementation of
circular data queues for memory operands.
The additional MACSR contains a 4-bit operational mode field and condition flags.
Operational mode bits control the overflow/saturation mode, whether operands are signed
or unsigned, whether operands are treated as integers or fractions, and how rounding is
performed. Negative, zero, and multiple overflow condition flags are also provided.
5.4 Memory Map/Register Set
The EMAC provides the following program-visible registers:
•
•
•
•
Four 32-bit accumulators (ACCx = ACC0, ACC1, ACC2, and ACC3)
Eight 8-bit accumulator extensions (two per accumulator), packaged as two 32-bit
values for load and store operations (ACCext01 and ACCext23)
One 16-bit mask register (MASK)
One 32-bit MAC status register (MACSR) including four indicator bits signaling
product or accumulation overflow (one for each accumulator: PAV0–PAV3)
These registers are shown in Figure 5-6.
31
0
MACSR
ACC0
ACC1
ACC2
ACC3
ACCext01
ACCext23
MASK
MAC status register
MAC accumulator 0
MAC accumulator 1
MAC accumulator 2
MAC accumulator 3
Extensions for ACC0 and ACC1
Extensions for ACC2 and ACC3
MAC mask register
Figure 5-6. EMAC Register Set
5.4.1 MAC Status Register (MACSR)
MACSR functionality is organized as follows:
•
•
5-6
MACSR[11–8] contains one product/accumulation overflow flag per accumulator.
MACSR[7–4] defines the operating configuration of the MAC unit.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Memory Map/Register Set
•
MACSR[3–0] contains indicator flags from the last MAC instruction execution.
Bit 31
12
11
10
9
8
Prod/acc overflow flags
Field
—
7
6
PAV3 PAV2 PAV1 PAV0 OMC S/U
Reset
5
4
3
2
Operational Mode
F/I
R/T
1
0
V
EV
PAV
N
Z
All zeros
R/W
R/W
Figure 5-7. MAC Status Register (MACSR)
Table 5-1 describes MACSR fields.
Freescale Semiconductor, Inc...
Table 5-1. MACSR Field Descriptions
Bits
Name
31–12
—
11–8
PAVx
Description
Reserved, should be cleared
Product/accumulation overflow flags. Contains four flags, one per accumulator, indicating if past
MAC or MSAC instructions generated an overflow during product calculation or from the 48-bit
accumulation. When a MAC or MSAC instruction is executed, the PAVx flag associated with the
destination accumulator is used to form the general overflow flag, MACSR[V]. Once set, each flag
remains set until V is cleared by a MOV.L , MACSR instruction or the accumulator is loaded
directly.
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-7
Freescale Semiconductor, Inc.
Memory Map/Register Set
Table 5-1. MACSR Field Descriptions (Continued)
Bits
Name
Description
Freescale Semiconductor, Inc...
7–4
5-8
Operational Mode Field
7
OMC
Overflow/saturation mode. Used to enable or disable saturation mode on overflow. If set,
the accumulator is set to the appropriate constant on any operation which overflows the
accumulator. Once saturated, the accumulator remains unaffected by any other MAC or
MSAC instructions until either the overflow bit is cleared or the accumulator is directly
loaded.
6
S/U
Signed/unsigned operations.
In integer mode:
S/U determines whether operations performed are signed or unsigned. It also determines
the accumulator value during saturation, if enabled.
0 Signed numbers. On overflow, if OMC is enabled, an accumulator saturates to the most
positive (0x7FFF_FFFF) or the most negative (0x8000_0000) number, depending on
both the instruction and the value of the product that overflowed.
1 Unsigned numbers. On overflow, if OMC is enabled, an accumulator saturates to the
smallest value (0x0000_0000) or the largest value (0xFFFF_FFFF), depending on the
instruction.
In fractional mode:
S/U controls rounding while storing an accumulator to a general-purpose register.
0 Move accumulator without rounding to a 16-bit value. Accumulator is moved to a
general-purpose register as a 32-bit value.
1 The accumulator is rounded to a 16-bit value using the round-to-nearest (even) method
when it is moved to a general-purpose register. See Section 5.4.1.1, “Fractional
Operation Mode.” The resulting 16-bit value is stored in the lower word of the destination
register. The upper word is zero-filled. The accumulator value is not affected by this
rounding procedure.
5
F/I
Fraction/integer mode Determines whether input operands are treated as fractions or
integers.
0 Integers can be represented in either signed or unsigned notation, depending on the
value of S/U.
1 Fractions are represented in signed, fixed-point, two’s complement notation. Values
range from -1 to 1- 2-15 for 16-bit fractions and -1 to 1 - 2-31 for 32-bit fractions. See
Table 5-2.
4
R/T
Round/truncate mode. Controls the rounding procedure used with fractional MAC, MOV.L
ACCx,Rx, or MSAC.L instructions.
0 Truncate. The product’s lsbs are dropped before it is combined with the accumulator.
Additionally, when a store accumulator instruction is executed (MOV.L ACCx,Rx), the 8
lsbs of the 48-bit accumulator logic are simply truncated.
1 Round-to-nearest (even). The 64-bit product of two 32-bit, fractional operands is
rounded to the nearest 40-bit value. If the low-order 24 bits equal 0x80_0000, the upper
40 bits are rounded to the nearest even (lsb = 0) value.See Section 5.4.1.1, “Fractional
Operation Mode.” Additionally, when a store accumulator instruction is executed (MOV.L
ACCx,Rx), the lsbs of the 48-bit accumulator logic are used to round the resulting 16- or
32-bit value. If MACSR[S/U] = 0 and MACSR[R/T] = 1, the low-order 8 bits are used to
round the resulting 32-bit fraction. If MACSR[S/U] = 1, the low-order 24 bits are used to
round the resulting 16-bit fraction.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Memory Map/Register Set
Freescale Semiconductor, Inc...
Table 5-1. MACSR Field Descriptions (Continued)
Bits
Name
Description
3–0
—
Flags
3
N
Negative. Set if the msb of the result is set, cleared otherwise. N is affected only by MAC,
MSAC, and load operations and not by MULS and MULU instructions.
2
Z
Zero. Set if the result equals zero, cleared otherwise. This bit is affected only by MAC,
MSAC, and load operations and not by MULS and MULU instructions.
1
V
Overflow. Set if an arithmetic overflow occurs on a MAC or MSAC instruction indicating
that the result cannot be represented in the limited width of the EMAC. V is set only if a
product overflow occurs or the accumulation overflows the 48-bit structure. V is evaluated
on each MAC or MSAC operation and uses the appropriate PAVx flag in the next-state V
evaluation.
0
EV
Extension overflow. Signals that the last MAC or MSAC instruction overflowed the 32 lsbs
in integer mode or the 40 lsbs in fractional mode of the destination accumulator. However,
the result is still accurately represented in the combined 48-bit accumulator structure.
Although an overflow has occurred, the correct result, sign, and magnitude are contained
in the 48-bit accumulator. Subsequent MAC or MSAC operations may return the
accumulator to a valid 32/40-bit result.
Table 5-2 summarizes the interaction of the S/U, F/I, and R/T control bits (MACSR[6–4]).
Table 5-2. Summary of S/U, F/I, and R/T Control Bits
S/U F/I R/T
Operational Modes
0
0
x
Signed, integer
0
1
0
Signed, fractional
Truncate on MAC.L and MSAC.L
No round on accumulator stores
0
1
1
Signed, fractional,
Round on MAC.L and MSAC.L
Round-to-32-bits on accumulator stores
1
0
x
Unsigned, integer
1
1
0
Signed, fractional,
Truncate on MAC.L and MSAC.L
Round-to-16-bits on accumulator stores
1
1
1
Signed, fractional,
Round on MAC.L and MSAC.L,
Round-to-16-bits on accumulator stores
5.4.1.1 Fractional Operation Mode
This section describes behavior when fractional mode is used (MACSR[F/I] is set).
5.4.1.1.1 Rounding
While the processor is in fractional mode, there are two operations during which rounding
can occur.
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-9
Freescale Semiconductor, Inc.
Memory Map/Register Set
•
Freescale Semiconductor, Inc...
•
A store accumulator instruction is executed (MOV.L ACCx,Rx). The lsbs of the
48-bit accumulator logic are used to round the resulting 16- or 32-bit value. If
MACSR[S/U] = 0, the low-order 8 bits are used to round the resulting 32-bit
fraction. If MACSR[S/U] = 1, the low-order 24 bits are used to round the resulting
16-bit fraction.
A MAC (or MSAC) instruction with 32-bit operands is executed. If MACSR[R/T]
is zero, multiplying two 32-bit numbers creates a 64-bit product that is truncated to
the upper 40-bits; otherwise, it is rounded using round-to-nearest (even) method.
To understand the round-to-nearest-even method, consider the following example involving
the rounding of a 32-bit number, R0, to a 16-bit number. Using this method, the 32-bit
number is rounded to the closest 16-bit number possible. Let the high-order 16 bits of R0
be named R0.U and the low-order 16 bits be R0.L.
•
•
•
If R0.L is less than 0x8000, the result is truncated to the value of R0.U.
If R0.L is greater than 0x8000, the upper word is incremented (rounded up).
If R0.L is 0x8000, R0 is half-way between two 16-bit numbers. In this case,
rounding is based on the lsb of R0.U, so the result is always even (lsb = 0).
— If the lsb of R0.U = 1 and R0.L = 0x8000, the number is rounded up.
— If the lsb of R0.U = 0 and R0.L =0x8000, the number is rounded down.
This method minimizes rounding bias and creates as statistically correct an answer
as possible.
The rounding algorithm is summarized in the following pseudocode:
if R0.L < 0x8000
then Result = R0.U
else if R0.L > 0x8000
then Result = R0.U + 1
else if LSB of R0.U = 0
then Result = R0.U
else Result = R0.U + 1
/* R0.L = 0x8000 */
The round-to-nearest-even technique is also known as convergent rounding.
5.4.1.1.2 Saving and Restoring the EMAC Programming Model
The presence of rounding logic in the output datapath of the EMAC requires that special
care be taken during the save and restore of the EMAC programming model. In particular,
any result rounding modes must be disabled during the save/restore process so the exact
bit-wise contents of the EMAC registers are accessed. Consider the following memory
structure containing the EMAC programming model:
struct
5-10
macState {
int acc0;
int acc1;
int acc2;
int acc3;
int accext01;
int accext02;
int mask;
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Memory Map/Register Set
int macsr;
} macState;
Freescale Semiconductor, Inc...
The following assembly language routine shows the proper sequence for a correct EMAC
state save. This code assumes all Dn and An registers are available for use and the memory
location of the state save is defined by A7.
EMAC_state_save:
move.l macsr,d7
clr.l
d0
move.l d0,macsr
move.l acc0,d0
move.l acc1,d1
move.l acc2,d2
move.l acc3,d3
move.l accext01,d4
move.l accext23,d5
move.l mask,d6
movem.l #0x00ff,(a7)
;
;
;
;
save the macsr
zero the register to ...
disable rounding in the macsr
save the accumulators
; save the accumulator extensions
; save the address mask
; move the state to memory
The following code performs the EMAC state restore:
EMAC_state_restore:movem.l (a7),#0x00ff; restore the state from memory
move.l #0,macsr
; disable rounding in the macsr
move.l d0,acc0
; restore the accumulators
move.l d1,acc1
move.l d2,acc2
move.l d3,acc3
move.l d4,accext01
; restore the accumulator extensions
move.l d5,accext23
move.l d6,mask
; restore the address mask
move.l d7,macsr
; restore the macsr
By executing this type of sequence, the exact state of the EMAC programming model can
be correctly saved and restored.
5.4.1.1.3 MULS/MULU
MULS and MULU are unaffected by fractional mode operation; operands are still assumed
to be integers.
5.4.1.1.4 Scale Factor in MAC or MSAC instructions
The scale factor is ignored while the MAC is in fractional mode.
5.4.2 Mask Register (MASK)
The 32-bit mask register (MASK) implements the low-order 16 bits, to minimize
complications about loading and storing only 16 bits and the associated alignment
requirements. When the MASK is loaded, the low-order 16 bits of the source operand are
actually loaded into the register. When it is stored, the upper 16 bits are forced to all ones.
This register performs a simple AND with the address. That is, the processor calculates the
normal operand address and, if enabled, that address is then ANDed with {0xFFFF,
MASK[15:0]} to form the final address. So, MASK register bits are cleared, constraining
the address to a certain region. This is used primarily to implement circular queues in
conjunction with the (Ay) + addressing mode.
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-11
Freescale Semiconductor, Inc.
EMAC Instruction Set Summary
This feature minimizes the addressing support required for filtering, convolution, or any
routine that implements a data array as a circular queue. For MAC + MOVE operations, the
MASK contents can optionally be included in all memory effective address calculations.
The syntax is as follows:
MAC.sz
Ry,RxSF,<ea>y&,Rw
The & operator enables the use of MASK and causes bit 5 of the extension word to be set.
The exact algorithm for the use of MASK is as follows:
Freescale Semiconductor, Inc...
if extension word, bit [5] = 1, the MASK bit, then
if <ea> = (An)
oa = An & {0xFFFF, MASK}
if <ea> = (An)+
oa = An
An = (An + 4) & {0xFFFF, MASK}
if <ea> =-(An)
oa = (An - 4) & {0xFFFF, MASK}
An = (An - 4) & {0xFFFF, MASK}
if <ea> = (d16,An)
oa = (An + se_d16) & {0xFFFF0x, MASK}
Here, oa is the calculated operand address and se_d16 is a sign-extended 16-bit
displacement. For auto-addressing modes of post-increment and pre-decrement, the
calculation of the updated An value is also shown.
Use of the post-increment addressing mode, (An)+ with MASK is suggested for circular
queue implementations.
5.5 EMAC Instruction Set Summary
Figure 5-3 summarizes EMAC unit instructions.
Table 5-3. EMAC Instruction Summary
Command
Mnemonic
Description
Multiply Signed
MULS <ea>y,Dx
Multiplies two signed operands yielding a signed result
Multiply Unsigned
MULU <ea>y,Dx
Multiplies two unsigned operands yielding an unsigned result
Multiply Accumulate
MAC Ry,RxSF,ACCx
MSAC Ry,RxSF,ACCx
Multiplies two operands and adds/subtracts the product
to/from an accumulator
Multiply Accumulate
with Load
MAC Ry,Rx,<ea>y,Rw,ACCx Multiplies two operands and combines the product to an
MSAC Ry,Rx,<ea>y,Rw,ACCx accumulator while loading a register with the memory
operand
Load Accumulator
MOV.L {Ry,#imm},ACCx
Loads an accumulator with a 32-bit operand
Store Accumulator
MOV.L ACCx,Rx
Writes the contents of an accumulator to a CPU register
Copy Accumulator
MOV.L ACCy,ACCx
Copies a 48-bit accumulator
Load MACSR
MOV.L {Ry,#imm},MACSR
Writes a value to MACSR
Store MACSR
MOV.L MACSR,Rx
Write the contents of MACSR to a CPU register
5-12
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
EMAC Instruction Set Summary
Table 5-3. EMAC Instruction Summary
Freescale Semiconductor, Inc...
Command
Mnemonic
Description
Store MACSR to CCR
MOV.L MACSR,CCR
Write the contents of MACSR to the CCR
Load MAC Mask Reg
MOV.L {Ry,#imm},MASK
Writes a value to the MASK register
Store MAC Mask Reg
MOV.L MASK,Rx
Writes the contents of the MASK to a CPU register
Load AccExtensions01
MOV.L {Ry,#imm},ACCext01
Loads the accumulator 0,1 extension bytes with a 32-bit
operand
Load AccExtensions23
MOV.L {Ry,#imm},ACCext23
Loads the accumulator 2,3 extension bytes with a 32-bit
operand
Store AccExtensions01 MOV.L ACCext01,Rx
Writes the contents of accumulator 0,1 extension bytes into a
CPU register
Store AccExtensions23 MOV.L ACCext23,Rx
Writes the contents of accumulator 2,3 extension bytes into a
CPU register
5.5.1 Data Representation
MACSR[6–5] selects one of the following three modes, where each mode defines a unique
operand type.
•
•
•
Two’s complement signed integer: In this format, an N-bit operand value lies in the
range -2(N-1) < operand < 2(N-1) - 1. The binary point is right of the lsb.
Unsigned integer: In this format, an N-bit operand value lies in the range 0 < operand
< 2N - 1. The binary point is right of the lsb.
Two’s complement, signed fractional: In an N-bit number, the first bit is the sign bit.
The remaining bits signify the first N-1 bits after the binary point. Given an N-bit
number, aN-1aN-2aN-3... a2a1a0, its value is given by the equation in Figure 5-8.
N–2
value = – ( 1 ⋅ a N – 1 ) +
∑2
(i + 1 – N)
⋅ ai
i=0
Figure 5-8. Two’s Complement, Signed Fractional Equation
This format can represent numbers in the range -1 < operand < 1 - 2(N-1).
For words and longwords, the largest negative number that can be represented is -1, whose
internal representation is 0x8000 and 0x8000_0000, respectively. The largest positive word
is 0x7FFF or (1 - 2-15); the most positive longword is 0x7FFF_FFFF or (1 - 2-31).
5.5.2 MAC Opcodes
MAC opcodes are mapped into line A and are described in the PRM (see URL
http://www.motorola.com/ColdFire/).
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-13
Freescale Semiconductor, Inc.
EMAC Instruction Set Summary
Note the following:
•
•
Freescale Semiconductor, Inc...
•
•
Unless noted otherwise, the setting of MACSR[N,Z] is based on the result of the
final operation involving the product and the accumulator.
The overflow (V) flag is handled differently. It is set if the complete product cannot
be represented as a 40-bit value (this applies to 32x32 integer operations only) or if
the combination of the product with an accumulator cannot be represented in the
given number of bits. The EMAC design includes an additional
product/accumulation overflow bit for each accumulator that are treated as sticky
indicators and are used to calculate the V bit on each MAC or MSAC instruction.
See Section 5.4.1, “MAC Status Register (MACSR).”
For the MAC design, the assembler syntax of the MAC (multiply and add to
accumulator) and MSAC (multiply and subtract from accumulator) instructions
does not include a reference to the single accumulator. For the EMAC, it is expected
that assemblers support this syntax and that no explicit reference to an accumulator
is interpreted as a reference to ACC0. These assemblers would also support syntaxes
where the destination accumulator is explicitly defined.
The optional 1-bit shift of the product is specified using the notation {<< | >>} SF,
where <<1 indicates a left shift and >>1 indicates a right shift. The shift is performed
before the product is added to or subtracted from the accumulator. Without this
operator, the product is not shifted. If the EMAC is in fractional mode (MACSR[F/I]
is set), SF is ignored and no shift is performed. Because a product can overflow, the
following guidelines are followed:
— For unsigned word and longword operations, a zero is shifted into the product on
right shifts.
— For signed, word operations, the sign bit is shifted into the product on right shifts
unless the product is zero. For signed, longword operations, the sign bit is shifted
into the product unless an overflow occurs or the product is zero, in which case
a zero is shifted in.
— For all left shifts, a zero is inserted into the lsb position.
The following pseudocode explains basic MAC or MSAC instruction functionality. This
example is presented as a case statement covering the three basic operating modes with
signed integers, unsigned integers, and signed fractionals. Throughout this example, a
comma-separated list in curly brackets, {}, indicates a concatenation operation.
switch (MACSR[6:5])
/* MACSR[S/U, F/I] */
{
case 0:
/* signed integers */
if (MACSR.OMC == 0 || MACSR.PAVx == 0)
then {
MACSR.PAVx = 0
/* select the input operands */
if (sz == word)
then {if (U/Ly == 1)
then operandY[31:0] = {sign-extended Ry[31], Ry[31:16]}
else operandY[31:0] = {sign-extended Ry[15], Ry[15:0]}
if (U/Lx == 1)
5-14
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
EMAC Instruction Set Summary
then operandX[31:0] = {sign-extended Rx[31], Rx[31:16]}
else operandX[31:0] = {sign-extended Rx[15], Rx[15:0]}
}
else {operandY[31:0] = Ry[31:0]
operandX[31:0] = Rx[31:0]
}
Freescale Semiconductor, Inc...
/* perform the multiply */
product[63:0] = operandY[31:0] * operandX[31:0]
/* check for product overflow */
if ((product[63:39] != 0x0000_00_0) && (product[63:39] != 0xffff_ff_1))
then {
/* product overflow */
MACSR.PAVx = 1
MACSR.V = 1
if (inst == MSAC && MACSR.OMC == 1)
then if (product[63] == 1)
then result[47:0] = 0x0000_7fff_ffff
else result[47:0] = 0xffff_8000_0000
else if (MACSR.OMC == 1)
then /* overflowed MAC,
saturationMode enabled */
if (product[63] == 1)
then result[47:0] = 0xffff_8000_0000
else result[47:0] = 0x0000_7fff_ffff
}
/* sign-extend to 48 bits before performing any scaling */
product[47:40] = {8{product[39]}}
/* sign-extend */
/* scale product before combining with accumulator */
switch (SF)
/* 2-bit scale factor */
{
case 0:
/* no scaling specified */
break;
case 1:
/* SF = “<< 1” */
product[40:0] = {product[39:0], 0}
break;
case 2:
/* reserved encoding */
break;
case 3:
/* SF = “>> 1” */
product[39:0] = {product[39], product[39:1]}
break;
}
if (MACSR.PAVx == 0)
then {if (inst == MSAC)
then result[47:0] = ACCx[47:0] - product[47:0]
else result[47:0] = ACCx[47:0] + product[47:0]
}
/* check for accumulation overflow */
if (accumulationOverflow == 1)
then {MACSR.PAVx = 1
MACSR.V = 1
if (MACSR.OMC == 1)
then /* accumulation overflow,
saturationMode enabled */
if (result[47] == 1)
then result[47:0] = 0x0000_7fff_ffff
else result[47:0] = 0xffff_8000_0000
}
/* transfer the result to the accumulator */
ACCx[47:0] = result[47:0]
}
MACSR.V = MACSR.PAVx
MACSR.N = ACCx[47]
if (ACCx[47:0] == 0x0000_0000_0000)
then MACSR.Z = 1
else MACSR.Z = 0
if ((ACCx[47:31] == 0x0000_0) || (ACCx[47:31] == 0xffff_1))
then MACSR.EV = 0
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-15
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
EMAC Instruction Set Summary
else MACSR.EV = 1
break;
case 1,3:
/* signed fractionals */
if (MACSR.OMC == 0 || MACSR.PAVx == 0)
then {
MACSR.PAVx = 0
if (sz == word)
then {if (U/Ly == 1)
then operandY[31:0] = {Ry[31:16], 0x0000}
else operandY[31:0] = {Ry[15:0], 0x0000}
if (U/Lx == 1)
then operandX[31:0] = {Rx[31:16], 0x0000}
else operandX[31:0] = {Rx[15:0], 0x0000}
}
else {operandY[31:0] = Ry[31:0]
operandX[31:0] = Rx[31:0]
}
/* perform the multiply */
product[63:0] = (operandY[31:0] * operandX[31:0]) << 1
/* check for product rounding */
if (MACSR.R/T == 1)
then { /* perform convergent rounding */
if (product[23:0] > 0x80_0000)
then product[63:24] = product[63:24] + 1
else if ((product[23:0] == 0x80_0000) && (product[24] == 1))
then product[63:24] = product[63:24] + 1
}
/* sign-extend to 48 bits and combine with accumulator */
/* check for the -1 * -1 overflow case */
if ((operandY[31:0] == 0x8000_0000) && (operandX[31:0] == 0x8000_0000))
then product[71:64] = 0x00
/* zero-fill */
else product[71:64] = {8{product[63]}}
/* sign-extend */
if (inst == MSAC)
then result[47:0] = ACCx[47:0] - product[71:24]
else result[47:0] = ACCx[47:0] + product[71:24]
/* check for accumulation overflow */
if (accumulationOverflow == 1)
then {MACSR.PAVx = 1
MACSR.V = 1
if (MACSR.OMC == 1)
then /* accumulation overflow,
saturationMode enabled */
if (result[47] == 1)
then result[47:0] = 0x007f_ffff_ff00
else result[47:0] = 0xff80_0000_0000
}
/* transfer the result to the accumulator */
ACCx[47:0] = result[47:0]
}
MACSR.V = MACSR.PAVx
MACSR.N = ACCx[47]
if (ACCx[47:0] == 0x0000_0000_0000)
then MACSR.Z = 1
else MACSR.Z = 0
if ((ACCx[47:39] == 0x00_0) || (ACCx[47:39] == 0xff_1))
then MACSR.EV = 0
else MACSR.EV = 1
break;
case 2:
/* unsigned integers */
if (MACSR.OMC == 0 || MACSR.PAVx == 0)
then {
MACSR.PAVx = 0
/* select the input operands */
if (sz == word)
then {if (U/Ly == 1)
then operandY[31:0] = {0x0000, Ry[31:16]}
else operandY[31:0] = {0x0000, Ry[15:0]}
if (U/Lx == 1)
then operandX[31:0] = {0x0000, Rx[31:16]}
else operandX[31:0] = {0x0000, Rx[15:0]}
}
else {operandY[31:0] = Ry[31:0]
5-16
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
EMAC Instruction Set Summary
operandX[31:0] = Rx[31:0]
}
/* perform the multiply */
product[63:0] = operandY[31:0] * operandX[31:0]
/* check for product overflow */
if (product[63:40] != 0x0000_00)
then {
/* product overflow */
MACSR.PAVx = 1
MACSR.V = 1
if (inst == MSAC && MACSR.OMC == 1)
then result[47:0] = 0x0000_0000_0000
else if (MACSR.OMC == 1)
then /* overflowed MAC,
saturationMode enabled */
result[47:0] = 0xffff_ffff_ffff
}
Freescale Semiconductor, Inc...
/* zero-fill to 48 bits before performing any scaling */
product[47:40] = 0
/* zero-fill upper byte */
/* scale product before combining with accumulator */
switch (SF)
/* 2-bit scale factor */
{
case 0:
/* no scaling specified */
break;
case 1:
/* SF = “<< 1” */
product[40:0] = {product[39:0], 0}
break;
case 2:
/* reserved encoding */
break;
case 3:
/* SF = “>> 1” */
product[39:0] = {0, product[39:1]}
break;
}
/* combine with accumulator */
if (MACSR.PAVx == 0)
then {if (inst == MSAC)
then result[47:0] = ACCx[47:0] - product[47:0]
else result[47:0] = ACCx[47:0] + product[47:0]
}
/* check for accumulation overflow */
if (accumulationOverflow == 1)
then {MACSR.PAVx = 1
MACSR.V = 1
if (inst == MSAC && MACSR.OMC == 1)
then result[47:0] = 0x0000_0000_0000
else if (MACSR.OMC == 1)
then /* overflowed MAC,
saturationMode enabled */
result[47:0] = 0xffff_ffff_ffff
}
/* transfer the result to the accumulator */
ACCx[47:0] = result[47:0]
}
MACSR.V = MACSR.PAVx
MACSR.N = ACCx[47]
if (ACCx[47:0] == 0x0000_0000_0000)
then MACSR.Z = 1
else MACSR.Z = 0
if (ACCx[47:32] == 0x0000)
then MACSR.EV = 0
else MACSR.EV = 1
break;
}
Chapter 5. Enhanced Multiply-Accumulate Unit (EMAC)
For More Information On This Product,
Go to: www.freescale.com
5-17
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
EMAC Instruction Set Summary
5-18
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 6
Instruction Pipeline and Timing
This chapter describes performance features of the V4 ColdFire processor pipeline
structure. These features are common for all V4 standard products as well as custom
products using the CF4e core. Relevant information on the CF4e EMAC, FPU, and MMU
are also included. It is intended as a guide for developing compilers or optimizing assembly
language application code.
This chapter describes the basic V4 pipeline strategy, contrasting it with Version 2 and 3
designs. It provides performance-related details of the instruction fetch and operand
execution pipelines (IFP and OEP). Appendixes describe performance parameters for the
complete ColdFire instruction set.
6.1 Basic V4 Pipeline Strategy
All ColdFire generations include two independent, decoupled pipelines. The instruction
fetch pipeline (IFP) prefetches the instruction stream, examines it to predict changes of
flow, partially decodes instructions, and packages fetched data into instructions for the
OEP. The instruction stream is then gated into the operand execution pipeline (OEP), which
decodes instructions, fetches required operands, and executes required functions. The IFP
and OEP are decoupled by a FIFO instruction buffer. The IFP can prefetch instructions
before the OEP needs them, minimizing the wait for instructions. This organization has
proven very efficient implementation as dictated by the variable-length ColdFire instruction
set.
Version 2 ColdFire processor design minimizes overall size while maintaining a reasonable
performance level. As a result, the V2 OEP is a simple two-stage design, where instructions
requiring operand memory read references are sent through both stages twice. With this and
the significant K-Bus enhancements in the V3 design, V4 minimizes instruction latency by
unrolling the OEP to support single-cycle latency for most opcodes.
Table 6-1 highlights these differences and presents a subset of execution times for the basic
instructions across the ColdFire architecture versions.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-1
Freescale Semiconductor, Inc.
Basic V4 Pipeline Strategy
Table 6-1. CFxCore Processor Execution Latency
Number of Processor Cycles
Freescale Semiconductor, Inc...
Instruction
Operation
V2
V3
V4
<op> Ry,Rx
Register-to-register
1
1
1
mov.l <mem>y,Rx
32-bit load
2
3
1
mov.b <mem>y,Rx
8-bit load
3
4
1
mov.w <mem>y,Rx
16-bit load
3
4
1
mov.* Ry,<mem>x
store
1
1
1
mov.l <mem>y,<mem>x
Memory-to-memory
2
3
2
<op> <mem>y,Rx
Embedded load
3
4
1
<op> Ry,<mem>x
Read-modify-write
3
4
1
bsr, jsr label
Subroutine call
3
1
1
rts
Subroutine return
5
8
2
bra label
Branch always
2
1
1
bcc label
(forward, not taken)
(forward, taken)
(backward, not taken)
(backward, taken)
(predicted correctly)
(predicted incorrectly)
Conditional branch
1
5
0
8
1
3
3
2
Unrolling the OEP has two major ramifications on the processor complex:
•
•
To support both pipeline structures, significantly more bandwidth is needed than
earlier ColdFire versions. Both V2 and V3 instruction fetch and operand requests
share the K-Bus. For V4, a unified structure does not provide enough bandwidth for
instruction fetches, so a split bus, or Harvard architecture, is used so that separate
instruction and data memories can be accessed concurrently. The basic two-stage
pipeline V3 K-Bus structure is retained, with one K-Bus connecting the IFP to
instruction memory and the other connecting the OEP to data memory.
By increasing the number of OEP stages, coupled with a V3-style IFP, the combined
pipeline depth requires architectural enhancements to handle branch instructions.
Specifically, the V3 branch acceleration scheme alone does not achieve desired
performance levels. V4 processors implement a two-level adaptive prediction
scheme with a small, direct-mapped branch cache using V3-style branch
acceleration to minimize mispredicted branches.
The resulting pipeline structure is shown in Figure 6-1.
6-2
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Basic V4 Pipeline Strategy
IFP
J
IAG
Branch
Cache
KC1
IC1
KC2
IC2
Branch
Accel.
Instruction
Memory
Physical
KC1
IED
Freescale Semiconductor, Inc...
IB
Memory
Management
Unit
(MMU)
M-Bus
K2M
Physical
KC1
OEP
DS
J
OAG
KC2
OC2
EX
Data
Memory
KC1
OC1
Mis
EMAC
FPU
DA
BDM
DSCLK DSI
DSDO
DDATA
PSTDDATA PSTCLK
Figure 6-1. CF4e ColdFire Processor Complex Block Diagram
The IFP and OEP consist of the following stages:
•
Four-stage IFP
— IAG (instruction address generation)
— IC1 (instruction fetch cycle 1)
— IC2 (instruction fetch cycle 2)
— IED (instruction early decode)
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-3
Freescale Semiconductor, Inc.
Instruction Fetch Pipeline (IFP)
Freescale Semiconductor, Inc...
•
— IB (instruction buffer)—(optional) Prefetched instructions can pass directly into
the OEP instruction registers, hiding IB from software in most cases.
Five-stage OEP with two optional write stages
— DS (decode and select)
— OAG (address generation)
— OC1 (operand fetch cycle 1)
— OC2 (operand fetch cycle 2)
— EX (execute)
— DA (write data available (operand write operations only)
— ST (store data (operand write operations only)
In summary, the OEP implements a five-stage design with limited superscalar instruction
issue capabilities to provide a cost-effective solution for the V4e core.
6.2 Instruction Fetch Pipeline (IFP)
The IFP generates instruction addresses and fetches. Because the fetch and execution
pipelines are decoupled by the eight-instruction FIFO instruction buffer (IB), the IFP can
prefetch instructions before the OEP needs them, minimizing stalls.
The IED stage provides early decoding, which effectively implements a hardware lookup
table. First, the 32 bits of fetched instruction data are separated into 16-bit parcels, the
minimum size for ColdFire instructions. Next, each parcel is used to index a hardware table
that provides a vector to decode fields that provide information such as instruction length,
data memory reference type, necessary register resources, and control information needed
early in the OEP DS stage. Finally, the instruction and its early decode information are
loaded into the instruction buffer or directly into the OEP. The early decode information
becomes the extended opword as it enters the OEP.
The primary IFP/OEP interface includes 48 bits of instruction (16-bit opword and two
optional 16-bit extension words) along with the extended opword containing the decode
vector. The IFP also supplies another 16-bit opword and its extended opword for the next
sequential instruction to the OEP to support the limited superscalar dispatch capabilities.
In addition to the prefetch function, the IFP improves the performance of change-of-flow
operations through the following:
• 8-entry, direct-mapped branch cache unit (BCU). Associates branch instruction
addresses with the target address for taken conditional branch instructions (Bcc).
Each entry includes a 2-bit, four-state branch prediction value that predicts a Bcc
instruction to be strongly or weakly taken or not taken. Branch folding of the branch
cache entry allows zero-cycle Bcc execution times for correctly predicted taken
branches. To maximize effectiveness of the small direct-mapped branch cache, a
hashed address indexes into the cache. The hashed address is generated as follows:
hashedBcuAddress[2:0] = IfpAddr[7:5] XOR IfpAddr[4:2]
6-4
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Fetch Pipeline (IFP)
•
•
128-entry, direct-mapped prediction table unit (PTU).
Predicts Bcc instructions that miss in the branch cache. If predicted as taken by the
PTU, the Bcc is accelerated in the same manner as that used in the Version 3
processor. This acceleration is implemented in the IED stage of the prefetch pipeline
and consists of the required hardware to calculate the target instruction address
which is then fed back into the IFP's IAG stage. This mechanism is also used for
certain unconditional change-of-flow instructions. Decoupling the IFP and OEP
usually yields a 1-cycle execution time for correctly predicted accelerated branches.
Again, a hashed address is generated to index into the prediction table. This hashed
address is defined as follows:
Freescale Semiconductor, Inc...
hashedPtuAddress[6:0] = IfpAddr[15:9] XOR IfpAddr[8:2]
•
:
4-entry LIFO hardware return stack. Accelerates subroutine return instructions.
Because an RTS can return control to any number of target addresses, RTS opcodes
do not benefit from traditional branch cache structures, so the four-entry stack
greatly improves performance of these instructions. This stack is invisible to
application software. When a subroutine call is executed, the IFP pushes the return
program counter (PC) onto the stack. When a subroutine return is encountered in the
prefetch stream, the top of the LIFO stack is popped off (if valid) and used to
establish a new prefetch stream. The OEP subsequently verifies that the predicted
target address matches the return address on top of the memory-based system stack.
If the address differs, the processor aborts processing and reestablishes control at the
address defined by the memory stack. Table 6-2 lists RTS execution times.
Table 6-2. V4 RTS Execution Times
Execution Time
Condition
2 (1/0)
Predicted and correct
8 (1/0)
Not predicted
9 (1/0)
Predicted but incorrect
V4 core performance has been evaluated across a large suite of compiled (no assembly
language optimizations) embedded benchmarks, from which the following IFP
branch-related performance parameters have been measured:
•
•
•
•
64% of all Bcc instructions are folded so they execute in 0 cycles
87% of the predictions provided by the BCU + PTU on Bcc instructions are correct
Conditional branches typically account for 11% of the dynamic pathlength
99% of all RTS opcodes are predicted correctly by the hardware return stack
The decoupled IFP and OEP and Harvard architecture of the V4 core efficiently handles the
variable-length ColdFire instruction set. Performance measurements indicate that a Base
CPI degradation factor of 0.06 cycles per instruction is caused by the OEP waiting for
opwords or extension words to be supplied by the IFP. In some cases, this factor can be
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-5
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
reduced by forcing branch target instructions to be aligned on 0-modulo-4 addresses at the
cost of increased code size. In all cases, the 16-bit TPF instruction (0x51FC) should be used
for text fill, rather than a NOP instruction (0x4E71). NOP synchronizes the pipeline as it
begins execution, producing a 6-cycle minimum latency versus the 1-cycle TPF opcode.
6.3 Operand Execution Pipeline (OEP)
Freescale Semiconductor, Inc...
The two instruction registers in the decode stage (DS) of the OEP are loaded from the FIFO
instruction buffer or are bypassed directly from the instruction early decode (IED). The
OEP consists of two, traditional two-stage RISC compute engines with a dual-ported
register file access feeding an arithmetic logic unit (ALU).
The compute engine at the top of the OEP (the address ALU) is used typically for operand
address calculations; the execution ALU at the bottom is used for instruction execution. The
resulting structure provides almost 24 Mbytes/MHz of bandwidth to the two compute
engines and supports single-cycle execution speeds for most instructions, including all load
and store operations and most embedded-load operations. The V4 OEP supports the
ColdFire Revision B instruction set, which adds a few new instructions to improve
performance and code density.
The OEP also implements the following advanced performance features:
•
•
•
Stalls are minimized by dynamically basing the choice between the address ALU or
execution ALU for instruction execution on the pipeline state.
The address ALU and register renaming resources together can execute heavily used
opcodes and forward results to subsequent instructions with no pipeline stalls.
Instruction folding involving MOVE instructions allows two instructions to be
issued in one cycle. The resulting microarchitecture approaches full superscalar
performance at a much lower silicon cost.
Unrolling the OEP into five stages improves V4 performance. The resulting structure is
termed ‘limited superscalar’ because of certain, heavily used instruction constructs that
support multiple-instruction dispatch. In particular, the notion of instruction folding where
two consecutive operations are combined into a single issue effectively creates zero-cycle
latency for some instructions.
6.3.1 V4 OEP Conceptual Pipeline Model
The basic compute engine for the V4 ColdFire processor consists of a two-stage
pipeline—a register file with dual read ports feeding an arithmetic/logic unit (ALU). This
compute engine follows the traditional RISC model and is a three-terminal device: two
input operands and a result. Because the ColdFire ISA is not a pure load/store model, the
OEP consists of two of these compute engines, one for operand address generation in the
DS/OAG stages and one for instruction located in the OC2/EX stages. The resulting port
list defines the following set of resources associated with each compute engine:
6-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
module (OagComputeEngine)
input (baseRegister
indexRegister)
output (addressResult);
module (ExComputeEngine)
input (operand1,
operand2)
output (executeResult);
Thus, each instruction has these six resources (four input operands and two results)
associated with its execution:
Freescale Semiconductor, Inc...
Instruction Resources = f(baseRegister,indexRegister, addressResult,
operand1,
operand2,
executeResult);
Although OagComputeEngine is most often used to calculate operand addresses for
memory-referencing instructions, its use is not limited to address generation. The entire
ColdFire ISA can be grouped into several broad categories:
•
•
•
Memory-referencing instructions that use the OagComputeEngine to generate the
operand address and execute in the ExComputeEngine
Register-to-register instructions execute in the ExComputeEngine
Register-to-register opcodes that execute in the OagComputeEngine
NOTE:
To support precise exceptions, results are not committed to
program-visible registers until an instruction completes the EX
stage even if it executes in the OagComputeEngine.
Register renaming makes executing instructions in OagComputeEngine advantageous
because results are available to subsequent instructions immediately. Instructions executed
in the OagComputeEngine include the following:
•
•
•
lea<ea>y,Ax
mov.l #<data>,Rx
moveq #<data>,Dx
Register renaming resources are the cascaded pipeline registers with the _oc1, _oc2, and
_ex labels in Figure 6-2. Note that the contents of these resources can be dynamically
forwarded to an operation in the DS stage with no pipeline stalls.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-7
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
Inst Registers
Register File
DS
OAG
Compute
Engine
OAG
Freescale Semiconductor, Inc...
Address
Result
_oc1
Memory
Read
Data
OC1
Register File
_oc2
OC2
_ex
EX
Compute
Engine
EX
Execute
Result
Register File
DA
Memory Write Data
Figure 6-2. OAGComputeEngine Register Renaming Resources
V4 adds a fourth category ({register, immediate}-to-register instructions) to the three basic
opcode classifications. These instructions can be executed in either compute engine. This
capability, called dynamic execution relocation, improves performance by moving
execution to the OagComputeEngine whenever possible, maximizing use of renaming
hardware to eliminate pipeline stalls. This category includes mov.l Ry,Rx and most
register-based arithmetic operations (for example, op.l {Ry |#imm},Rx). This pipeline
technology, developed for the MC68060, minimizes pipeline register-busy stalls. As each
instruction enters the OEP DS stage, the IFP’s early-decode logic supplies the extended
opword that specifies required instruction resources. Control logic checks whether required
input register resources have writes pending from the ExComputeEngine in the OEP’s
6-8
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
OAG, OC1, or OC2 stage. If no ExComputeEngine write from these stages is pending, the
execution of the opcode may be relocated from the EX stage to the OAG stage.
Freescale Semiconductor, Inc...
The last four OEP stages provide a strict scheme for prioritizing pending register updates.
A clear understanding of this scheme is crucial because there are often multiple updates for
a single destination register in the pipeline state at any time. Prioritization is as follows:
1.
2.
3.
4.
5.
6.
7.
8.
ExComputeEngine update, OAG stage (highest priority)
OagComputeEngine update,OAG stage
ExComputeEngine update, OC1 stage
OagComputeEngine update, OC1 stage
ExComputeEngine update, OC2 stage
OagComputeEngine update, OC2 stage
ExComputeEngine update, EX stage
OagComputeEngine update, EX stage (lowest priority)
The OEP implements a 2- x 4-stage scoreboard for tracking pending register updates from
the two compute engines. DS stage control logic uses this scoreboard to determine if
dynamic execution relocation can be performed and to generate pipeline stalls on
register-busy conditions, described in Section 6.3.3, “Sequence-Related OEP Stalls.”
V4 processor core performance measurements produce the following:
•
•
•
12% of the instructions always execute in the OagComputeEngine.
30% of the instructions can be executed in either compute engine, and 15% of the
total instructions are relocated to the OagComputeEngine.
18% of the instructions include auto-addressing mode updates {(An)+, -(An)}
performed by the OagComputeEngine. By summing these three classes, the
OagComputeEngine is used on 45% of the dynamic instructions.
Section 6.4, “Instruction Execution Locations,” gives a complete specification of the OEP
compute engine execution location for every ColdFire instruction.
6.3.2 Instruction Folding and the Limited Superscalar OEP
The V4 branch cache supports zero-cycle execution of correctly predicted taken conditional
branch instructions. The V4 OEP also supports instruction folding on other heavily used
constructs. In particular, MOVE is an ideal candidate for instruction folding for two
reasons. First, two-operand ColdFire constructs mean that simple assignment operations
using MOVE opcodes are used extensively. Second, because a MOVE simply involves an
assignment but no other computation operation, it can be combined with other opcodes for
dual-issue opportunities using both OEP compute engines. This instruction folding
provides limited superscalar dispatch.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-9
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
Using the nomenclature from the superscalar MC68060, two sequential instructions are
loaded into the OEP when an opcode is requested from the IFP. The OEP is implemented
as the primary OEP (pOEP) plus the DS stage for the secondary OEP (sOEP). The sOEP
instruction dispatch criteria are evaluated in the DS pipeline stage; if successful, the
secondary instruction is issued to the OAG stage. V4 sOEP instructions are restricted to 16
bits with no operand memory references and are executed in the ExComputeEngine.
V4 instruction pairs are grouped into the three following broad categories. By supporting
superscalar dispatch for heavily-used instruction constructs, the design approaches the
performance of a full, dual-pipeline OEP at a much lower silicon cost.
Freescale Semiconductor, Inc...
Group 1: Zero-cycle loads
pOEP inst = mov.l {Ry | <mem>y},Rx
sOEP inst = op.l {Rw | #qimm},Rx
(#qimm represents a 3-bit quick immediate operand)
For this pair, a MOVE shares a destination register with a secondary instruction and the full
capabilities of the ExComputeEngine’s three-terminal structure can be exploited. The
executeResult is a function of (operand1, operand2). The combined instructions are issued
into the OAG stage as op.l {Ry | <mem>y},{Rw | #qimm},Rx to be executed by the OEP
EX stage.
Group 2: Zero-cycle stores
pOEP inst = store.* Ry,<mem>x
sOEP inst = mov.l
Rw,Rz
movq.l #imm,Rz
or,
For this pair, a store operation (mov.{b,w,l} or clr.{b,w,l}) is combined with a simple
register load. The Ex compute engine store unit executes the operand write. This function
is tied directly to the operand1_ex register, providing the required post-alignment
multiplexing logic. The sOEP instruction is issued to the barrel shifter (BSU), that performs
a passOperand2 operation. ExComputeEngine processes both operations concurrently.
Group 3: Zero-cycle address results
pOEP inst =
sOEP inst =
lea
mov.l
movq.l
clr.l
mov3q.l
op.l
mov.l
movq.l
cmp.l
<ea>y,Rx
#imm,Rx
#imm,Rx
Rx
#qimm,Rx
{Rw | #qimm},Rz
Rw,Rz
#imm,Rz
Rw,Rz
or,
or,
or,
or,
or,
or,
or,
This pair combines a pOEP instruction executed by the OagComputeEngine (the
AddressResult) with a sOEP instruction executed in the ExComputeEngine.
Measurements show that folding instructions to create zero-cycle moves improves overall
processor performance by 10% on compiled code. Combining this with improvements
6-10
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
provided by zero-cycle Bcc instructions, 33% of the dynamic instruction stream is executed
as pairs in the OEP.
Superscalar dispatch can be disabled for performance analysis. Setting CACR[17] disables
Bcc instruction folding. Setting CACR[0] disables OEP instruction folding on zero-cycle
move pairs. Both are enabled at reset.
Freescale Semiconductor, Inc...
6.3.3 Sequence-Related OEP Stalls
The most common sequence-related OEP stall is the change/use (register-busy) register
stall, which occurs when an instruction modifies a register in the ExComputeEngine that a
subsequent instruction needs as an OagComputeEngine input. The subsequent instruction
stalls in the DS stage until the register is updated. Consider the following:
lsl.l
mov.l
d1,d0
(d8,a0,d0.l*4),d1
In this sequence, shown in Figure 6-3, the 2-cycle mov.l instruction stalls waiting for lsl.l
to update the d0 index register. The mov.l instruction requires a second pipeline cycle to
calculate the three-component indexed operand address.
Processor clock
3-cycle regBusy stall
DS
lsl
mov
mov
lsl
OAG
mov
lsl
OC1
mov
lsl
OC2
mov
lsl
EX
Register d0
Old
mov
New
Figure 6-3. Sequence-Related OEP Sequence Stall
In Figure 6-3, mov.l cannot begin its second cycle of indexed operand address generation
until the preceding instruction update d0, causing a worst-case, 3-cycle stall.
The worst-case change/use register-busy stalls are summarized as follows:
•
•
If an instruction changes an ExComputeEngine base register (An) and the next
instruction uses that register as an OagComputeEngine input, a 3-cycle stall occurs.
If mov.l <mem>y,Ax loads a base register and the next instruction uses that register
as an OagComputeEngine input, a 2-cycle stall occurs. This sequence, common in
pointer manipulation, is optimized by adding a OC2-to-DS forwarding datapath.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-11
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
•
•
If an instruction changes an ExComputeEngine register that the next instruction uses
as an index register as an OagComputeEngine input with a scale factor of 1 (Xi.l), a
2-cycle pipeline stall occurs.
If an instruction changes a register from either OagComputeEngine or
ExComputeEngine and the next instruction uses that register as an index register
with a scale factor other than 1 (for example, Xi.l*{2,4,8}) as an input to
OagComputeEngine, a 3-cycle stall occurs. The case is shown in Figure 6-3.
Freescale Semiconductor, Inc...
The first three stalls are minimized by using ExComputeEngine-to-DS stage forwarding
logic shown on the OEP block diagrams. For register-busy conditions involving index
registers with scale factors not equal to one, the register file must be updated at the
completion of the EX stage before the stalled instruction can continue.
Intervening instructions can reduce or eliminate the stall, as in the following sequence:
add.l
mov.l
(d16,a7),a0
(a0),d0
This sequence represents the first type of change/use pipeline stall (three cycles) on register
A0. If instruction scheduling can be applied, the stall can be reduced or eliminated.
add.l
op1
op2
mov.l
(d16,a7),a0
Ry,Rx
Rw,Rz
(a0),d0
By inserting single-cycle instructions, op1 and op2, mov.l stalls only 1 cycle because the
3-cycle hazard is reduced by the 2 cycles during which op1 and op2 execute. A third
single-cycle instruction would eliminate the stall.
The instructions in Table 6-3 prevent stalls by using register renaming logic to make
destination register results available to subsequent instructions. These instructions are
unconditionally executed in OagComputeEngine.
Table 6-3. Instructions that Make Results Available to Subsequent Instructions
Instruction
<op>
(Ay)+,Rx
<op>
-(Ay),Rx
<op>
Ry,(Ax)+
<op>
Ry,-(Ax)
clr.l dx
lea
<ea>y,Ax
mov.l #imm,Rx
mov.w #imm,Ax
mov3q.l #qimm,Rx
moveq #imm,Dx
6-12
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
Freescale Semiconductor, Inc...
NOTE:
Ry indicates a Dy or Ay source register. Rx indicates a Dx or
Ax destination register available either to the
OagComputeEngine on subsequent instructions as a base
register with no stall or as an index register with a scale factor
of 1 and no stall. If the destination register is then used as an
index with a different scale factor, the 3-cycle stall described
above occurs
V4 CPU measurements indicate a 0.12-cycle-per-instruction degradation factor associated
with change/use stalls across the embedded benchmark suite. Approximately 6% of the
dynamic instruction stream encounter this type of stall. The average stall is about 2 cycles
per register-busy.
6.3.4 EMAC-Specific OEP Sequence Stalls
The ColdFire family supports two multiply-accumulate implementations that provide
different levels of performance and capability for differing silicon costs. The EMAC
features a four-stage execution pipeline, optimized for 32-bit operands with a
fully-pipelined 32 x 32 multiply array and four 48-bit accumulators. A MAC or EMAC can
be attached to any version ColdFire core as determined by application requirements.
The EMAC execution pipeline overlaps the EX stage of the OEP; that is, the first stage of
the EMAC pipeline is the last stage of the basic OEP. EMAC units are designed for
sustained, fully-pipelined operation on accumulator load, copy, and multiply-accumulate
instructions. However, instructions that store contents of the multiply-accumulate
programming model can generate OEP stalls that expose the EMAC execution pipeline
depth, as in the following
mac.w
mov.l
Ry,Rx,Acc0
Acc0,Rz
The mov.l instruction that stores the accumulator to an integer register (Rz) stalls until the
program-visible copy of the accumulator is available. Figure 6-4 shows EMAC timing.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-13
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
Three-cycle
regBusy stall
DS
OAG
OC1
OC2
EX
mov
mov
mac
mov
mac
mov
mac
mov
mac
mov
mac
EMAC EX1
mac
Freescale Semiconductor, Inc...
EMAC EX2
mov
mac
EMAC EX3
mac
EMAC EX4
mac
accumulator 0
old
new
Figure 6-4. EMAC-Specific OEP Sequence Stall
In Figure 6-4, the OEP stalls the store-accumulator instruction for 3 cycles: the depth of the
EMAC pipeline minus 1. The minus 1 factor is needed because the OEP and EMAC
pipelines overlap by a cycle, the EX stage. As the store-accumulator instruction reaches the
EX stage where the operation is performed, the just-updated accumulator 0 value is
available.
As with change/use stalls between accumulators and general-purpose registers, introducing
intervening instructions that do not reference the busy register can reduce or eliminate
sequence-related store-MAC instruction stalls. In fact, a major benefit of the EMAC is the
addition of three accumulators to minimize stalls caused by exchanges between the
accumulator(s) and the general-purpose registers.
6.3.5 FPU-Specific OEP Sequence Stalls
One FPU-related read-after-write register data hazard merits mention involving OEP
register file operands sourced to the FPU (for example, fpGEN Ry,FPx instruction). If
the source Ry register is modified by a previous instruction, the FPU operation stalls for 1
cycle, so an executeResult-to-OC2 forwarding datapath is not required. This restriction
allows the register operand sourced directly from the OEP register file to the FPU relatively
early in the OC2 cycle. The stall occurs in OAG. Consider the following sequence:
sub.l <ea>y,d0
fadd.l d0,fp0
The source operand d0 is modified immediately before the FPU uses it. The fadd.l detects
the pending register update in its DS stage, forcing a 1-cycle stall in OAG. This stall delays
the OC2 stage for fadd.l until the updated d0 value is available from the OEP register file.
6-14
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
The following example shows the basic operation of the CF4e OEP with the FPU. Consider
the following code example taken from a popular floating-point benchmark:
rInnerproduct (result, a, b, row, column)
float *result, a[rowsize+1][rowsize+1], b[rowsize+1][rowsize+1];
int
row, column;
/* computes the inner product of A[row,*] and B[*,column] */
{
int i;
*result = 0.0;
for (i = 1; i <= rowsize; i++)
*result = *result + a[row][i] * b[i][column];
}
Freescale Semiconductor, Inc...
The inner for loop generates the following compiled code:
for_loop:
fmov.s
fmul.s
fadd.s
subq.l
lea
fmov.s
bpl.b
(a5),fp0
(a1)+,fp0
(a4),fp0
#1,d7
(const,a5),a5
fp0,(a4)
for_loop
;
;
;
;
;
;
;
fp0 = b[i][column]
fp0 = a[row][i] * b[i][column]
fp0 = result + a[][] * b[][]
decrement loop counter
adjust pointer for b[i][column]
store result
if done, exit, else continue loop
Due to concurrent OEP and FPU instruction execution, the visible execution time for this
loop is less than the simple summation of individual execution times.
Table 6-4. FPU Execution Example
Instruction Instruction Latency (CPU cycles) Apparent Latency (CPU cycles)
Comments
fmov.s
1
1
FP load
fmul.s
4
4
FP multiply
fadd.s
4
4
FP add
subq.l
1
0
Hidden in OEP
lea
1
0
Hidden in OEP
fmov.s
2
2
FP store
bpl.b
1
0
Hidden via instruction folding
TOTAL
14
11
Figure 6-5 shows the OEP/FPU pipeline diagrams for the example in Table 6-4.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-15
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
Cycle
OEP DS
OEP OAG
0
1
fmov fmul
2
3
4
6
fadd subq subq subq subq
fmov fmul
fadd fadd
OEP OC1
fmov fmul
OEP OC2
fmov
fadd
fadd
7
lea
subq
8
fmul
9
10
fmov fmov fmov
lea
fadd subq
fmov fmov fmov
lea
fadd subq
fmov fmov
lea
OEP EX
fmov fmul
fadd subq
FPU EX1
fmov fmul
fadd
fmul
FPU EX2
Freescale Semiconductor, Inc...
5
FPU EX3
lea
fmov
fmov
fmov fmov
fadd
fadd
fmul
FPU EX4
fmov fmov
fmul
fadd
Figure 6-5. for_loop Example
6.3.6 Operand Memory Sequence-Related Stalls
Operand reads and writes occur in different OEP stages; operand reads occur in OC1 and
operand writes occur in ST. Accordingly, the read-after-write hazard in a given local
memory can cause a 1-cycle stall. Specifically, if a memory read occupies OC1 and an
operand write is in ST, the read stalls for a cycle to allow the write to complete because the
memory array can only perform one operation per cycle. Performance measurements on
this type of stall show a relatively minor degradation factor of 0.04 cycles per instruction.
NOTE:
Read and write accesses must both be to the same local
memory. For example, both access the data cache.
6.3.7 V4 OEP Summary
Measurements across the entire embedded suite into an ideal, infinitely large local memory
show base CPI metrics ranging from 0.75 to 2.16 cycles per instruction, where the larger
number includes benchmarks with significant use of multi-cycle arithmetic operations such
as integer multiply and divide. The geometric mean across the entire suite is 1.34 cycles per
instruction for the V4 core. Section 6.5, “Instruction Execution Times,” lists instruction
execution times. Figure 6-6 shows a block diagram of the Ex execute engines of the V4
OEP.
Register elements in Figure 6-6 are shown as boxes with double lines showing boundaries
between pipeline stages. Data multiplexers where various source operands are connected at
the top are shown as ellipses; selected data is shown on the bottom.
6-16
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Operand Execution Pipeline (OEP)
Memory
Read
Data
Register File
FP-Register File
Freescale Semiconductor, Inc...
O
C
2
BSU
E
X
ALU
FP
FADD FMUL FDIV Misc.
Three-Stage
Multiplier
Acc-Register File
FPU
EMAC
Memory
Write
Data
Figure 6-6. CF4e Ex Execute Engines within the OEP
The last two OEP stages, operand fetch cycle 2 (OC2) and execute (EX), overlap the first
stages of the FPU and EMAC pipelines. The OEP and FPU follow a traditional RISC
execute engine model with a dual-ported register file feeding an ALU engine. In the CF4e
pipeline, this function is partitioned across stages OC2 and EX.
The OEP has the two register-file read ports feeding into muxes that select source operands
from data memory, the register file, or a fed-forward result from the preceding instruction.
The two source operands are registered and sent to the execute engine, which consists of a
basic ALU and BSU. The appropriate result is selected from all the potential sources (ALU,
BSU, FPU, and EMAC) and routed back into the register file or sent to memory as write
data for a store operation.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-17
Freescale Semiconductor, Inc.
Instruction Execution Locations
Freescale Semiconductor, Inc...
FPU hardware structure is similar to the OEP. Two source operands are selected from a
variety of potential sources: memory read data, register operands from the OEP, and
operands read from the FPU register file or the fed-forwarded results from the previous
instruction. Recall the 68K/ColdFire floating-point instruction set architecture has eight
unique floating-point data registers in the FPU’s register file. Once selected, source
operands are loaded into the two operand registers at the end of OC2 and then broadcast to
the FPU’s four internal engines: FADD, FMUL, FDIV, and miscellaneous unit. FADD
executes all add, subtract, and compare instructions; FMUL performs all multiplication;
FDIV performs division, square root, and move operations; and the miscellaneous module
executes all other operations. If the destination is a FPU register, the appropriate output
result bus is selected and fed back to the FPU register file; if the destination is an integer
register or memory, results are registered and then sent to the OEP.
The EMAC’s structure differs slightly because the basic instruction (Acc = Acc + Ry*Rx)
format differs from constructs executed by the OEP and FPU. For the EMAC, the two
source operands (Ry,Rx) are selected in the OEP and sent to the EMAC. As execution
begins, the two operands enter a three-stage, 32x32 multiplier. The product is formed at the
end of the third stage. It is then combined with the accumulator in stage four, when the
accumulator is read from a register file and combined with the product; the result is then
written back. Stores of any accumulator access a second read port and the appropriate value
is sent to the OEP to be loaded into the destination location in the integer register file.
6.4 Instruction Execution Locations
Table 6-5 shows the OEP compute engine execution location for V4 instructions.
Table 6-5. V4 ColdFire Compute Engine Location
Instruction1
<ea>
Oag
= Rn
Ex
add.l <ea>y,Rx
Either
x
add.l Dy,<ea>x
<ea>
Oag
= Mem
Ex
Either
<ea>
Oag
= Imm
Ex
x
x
x
addi.l #imm,Dx
x
addq.l #imm,<ea>x
addx.l Dy,Dx
x
x
x
x
x
and.l <ea>y,Dx
and.l Dy,<ea>x
x
x
andi.l #imm,Dx
x
asl.l <ea>y,Dx
x
x
asr.l <ea>y,Dx
x
x
bcc.{b,w,l}
x
bchg {Dy|#imm},<ea>x
x
x
bclr {Dy|#imm},<ea>x
x
x
6-18
Either
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Execution Locations
Table 6-5. V4 ColdFire Compute Engine Location (Continued)
Instruction1
<ea>
Oag
= Rn
Ex
Either
bra.{b,w,l}
= Mem
Ex
x
<ea>
Oag
= Imm
Ex
Either
x
bsr.{b,w,l}
x
btst {Dy|#imm},<ea>x
x
x
clr.b <ea>x
x
x
clr.w <ea>x
x
x
clr.l <ea>x
Either
x
bset {Dy|#imm},<ea>x
Freescale Semiconductor, Inc...
<ea>
Oag
x
x
cmp.b <ea>y,Rx
x
x
x
cmp.w <ea>y,Rx
x
x
x
cmp.l <ea>y,Rx
x
x
x
cmpi.b #imm,Dx
x
cmpi.w #imm,Dx
x
cmpi.l #imm,Dx
x
cpushl <ea>y
x
divs.w <ea>y,Dx
x
x
divs.l <ea>y,Dx
x
x
divu.w <ea>y,Dx
x
x
divu.l <ea>y,Dx
x
x
eor.l Dy,<ea>x
x
x
x
x
eori.l #imm,Dx
x
ext.{w,l} Dx
x
extb.l Dx
x
halt
illegal
x
intouch
jmp
x
jsr
x
lea <ea>y,Ax
x
link.w Ay,#imm
x
lsl.l <ea>y,Dx
x
x
lsr.l <ea>y,Dx
x
x
mac.w <ea>y,Rx,Accx
x
x
mac.l <ea>y,Rx,Accx
x
x
msac.w <ea>y,Rx,Accx
x
x
msac.l <ea>y,Rx,Accx
x
x
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-19
Freescale Semiconductor, Inc.
Instruction Execution Locations
Table 6-5. V4 ColdFire Compute Engine Location (Continued)
Instruction1
Freescale Semiconductor, Inc...
mov3q.l #imm,<ea>x
<ea>
Oag
= Rn
Ex
Either
x
<ea>
Oag
= Mem
Ex
Either
<ea>
Oag
= Imm
Ex
x
movclr.l Accy,Rx
x
move.b <ea>y,Dx
x
x
move.b Dy,<ea>x
x
move.b <mem>y,<mem>x
x
move.w <ea>y,Dx
x
x
move.w <ea>y,Ax
x
x
move.w Ry,<ea>x
x
move.w <ea>y,<ea>x
x
move.l <ea>y,Rx
x
x
move.l Ry,<ea>x
x
move.l <ea>y,<ea>x
x
x
x
x
x
move.l <ea>y,Accx
x
x
move.l <ea>y,Mac.CR
x
x
move.l Accy,Rx
x
move.l Mac.CR,Rx
x
move.w CCR,Dx
x
move.w Dy,CCR
x
move.w SR,Dx
x
move.w <ea>y,SR
x
x
movec Ry,Rc
x
movem.l <ea>y,#list
x
movem.l #list,<ea>x
x
moveq #imm,Dx
x
muls.w <ea>y,Dx
x
x
muls.l <ea>y,Dx
x
x
mulu.w <ea>y,Dx
x
x
mulu.l <ea>y,Dx
x
x
mvs.{b,w} <ea>y,Dx
x
x
x
mvz.{b,w} <ea>y,Dx
x
x
x
neg.l Dx
negx.l Dx
Either
x
x
x
x
nop
not.l Dx
x
or.l <ea>y,Dx
x
6-20
x
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
x
Freescale Semiconductor, Inc.
Instruction Execution Times
Table 6-5. V4 ColdFire Compute Engine Location (Continued)
Instruction1
<ea>
Oag
= Rn
Ex
Either
<ea>
Oag
= Mem
Ex
or.l Dy,<ea.>x
Either
<ea>
Oag
= Imm
Ex
x
ori.l #imm,Dx
Freescale Semiconductor, Inc...
Either
x
pea <ea>y
x
pulse
x
rems.l <ea>y,Dx
x
x
remu.l <ea>y,Dx
x
x
rte
x
rts
x
sats.l Dx
x
scc Dx
x
stop #imm
x
sub.l <ea>y,Rx
x
sub.l Dy,<ea>x
x
x
x
subi.l #imm,Dx
x
subq.l #imm,<ea>x
x
subx.l Dy,Dx
x
swap Dx
x
tas <ea>x
x
x
tpf
tpf.{w,l}
trap #imm
x
tst.{b,w,l} <ea>x
unlk Ax
x
x
x
wddata.{b,w,l} <ea>y
x
wdebug <ea>y
x
1
x
All EMAC and FPU instructions are executed in the Ex pipeline stage.
6.5 Instruction Execution Times
The timing data in this section assumes the following:
•
Execution times for individual instructions make no assumptions concerning the
OEP’s ability to dispatch multiple instructions in one machine cycle. For sequences
where instruction pairs are issued, the execution time of the first instruction defines
the execution time of pair; the second instruction effectively executes in zero cycles.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-21
Freescale Semiconductor, Inc.
Instruction Execution Times
•
•
The OEP is loaded with the opword and all required extension words at the
beginning of each instruction execution. This implies that the OEP spends no time
waiting for the IFP to supply opwords or extension words.
The OEP experiences no sequence-related pipeline stalls. For the V4, the most
common example of this type of stall occurs when a register is modified in the EX
engine and a subsequent instruction generates an address that uses the previously
modified register. The second instruction stalls in the OEP until the previous
instruction updates the register. For example:
muls.l
move.l
#<data>,d0
(a0,d0.l*4),d1
Freescale Semiconductor, Inc...
move.l waits 3 cycles for the muls.l to update d0. If consecutive instructions update
a register and use that register as a base of index value with a scale factor of 1
(Xi.l*1) in an address calculation, a 2-cycle pipeline stall occurs. If the destination
register is used as an index register with any other scale factor (Xi.l*2, Xi.l*4), a
3-cycle stall occurs. Table 6-3 lists instructions optimized to prevent such stalls.
NOTE:
Address register results from postincrement and predecrement
modes are available to subsequent instructions without stalls.
•
•
The OEP can complete all memory accesses without memory causing any stalls.
Thus, these timings assume an infinite, zero-wait state memory attached to the core.
Operand accesses are assumed to be aligned as follows:
— 16-bit operands are aligned on 0-modulo-2 addresses
— 32-bit operands are aligned on 0-modulo-4 addresses
Operands that do not meet these guidelines are misaligned. Table 6-6 shows how the
core decomposes a misaligned operand reference into a series of aligned accesses.
Table 6-6. Misaligned Operand References
1
Additional C(R/W)1
A[1:0]
Size
Bus Operations
x1
Word
Byte, Byte
2(1/0) if read
1(0/1) if write
x1
Long
Byte, Word, Byte
3(2/0) if read
2(0/2) if write
10
Long
Word, Word
2(1/0) if read
1(0/1) if write
Each timing entry is presented as C(r/w), described as follows:
C is the number of processor clock cycles, including all applicable operand fetches and writes, as well as all
internal core cycles required to complete the instruction execution.
r/w is the number of operand reads (r) and writes (w) required by the instruction. An operation performing a
read-modify write function is denoted as (1/1).
6-22
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Execution Times
6.5.1 MOVE Instruction Execution Times
The following tables show execution times for the MOVE.{B,W,L} instructions. Table 6-9
shows the timing for the other generic move operations.
Freescale Semiconductor, Inc...
NOTE:
In these tables, times using PC-relative effective addressing
modes are the same as using An-relative mode.
ET with {<ea> = (d16,PC)}
equals ET with {<ea> = (d16,An)}
ET with {<ea> = (d8,PC,Xi*SF)}
equals ET with {<ea> = (d8,An,Xi*SF)}
The (xxx).wl nomenclature refers to both forms of absolute
addressing, (xxx).w and (xxx).l.
Table 6-7 lists execution times for MOVE.{B,W} instructions.
Table 6-7. Move Byte and Word Execution Times
Destination
Source
Rx
(Ax)
(Ax)+
-(Ax)
(d16,Ax)
(d8,Ax,Xi*SF)
(xxx).wl
Dy
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
2(0/1)
1(0/1)
Ay
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
2(0/1)
1(0/1)
(Ay)
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
(Ay)+
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
-(Ay)
1(1/0)
21/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
(d16,Ay)
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
—
—
(d8,Ay,Xi*SF)
2(1/0)
3(1/1)
3(1/1)
3(1/1)
—
—
—
(xxx).w
1(1/0)
2(1/1)
2(1/1)
2(1/1)
—
—
—
(xxx).l
1(1/0)
2(1/1)
2(1/1)
2(1/1)
—
—
—
(d16,PC)
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
—
—
(d8,PC,Xi*SF)
2(1/0)
3(1/1)
3(1/1)
3(1/1)
—
—
—
#<xxx>
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
—
—
Table 6-8 lists timings for MOVE.L.
Table 6-8. Move Long Execution Times
Destination
Source
Rx
(Ax)
(Ax)+
-(Ax)
(d16,Ax)
(d8,Ax,Xi*SF)
(xxx).wl
Dy
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
2(0/1)
1(0/1)
Ay
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
2(0/1)
1(0/1)
(Ay)
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
(Ay)+
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-23
Freescale Semiconductor, Inc.
Instruction Execution Times
Table 6-8. Move Long Execution Times (Continued)
Destination
Freescale Semiconductor, Inc...
Source
Rx
(Ax)
(Ax)+
-(Ax)
(d16,Ax)
(d8,Ax,Xi*SF)
(xxx).wl
-(Ay)
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
(d16,Ay)
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
—
—
(d8,Ay,Xi*SF)
2(1/0)
3(1/1)
3(1/1)
3(1/1)
—
—
—
(xxx).w
1(1/0)
2(1/1)
2(1/1)
2(1/1)
—
—
—
(xxx).l
1(1/0)
2(1/1)
2(1/1)
2(1/1)
—
—
—
(d16,PC)
1(1/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
—
—
(d8,PC,Xi*SF)
2(1/0)
3(1/1)
3(1/1)
3(1/1)
—
—
—
#<xxx>
1(0/0)
1(0/1)
1(0/1)
1(0/1)
—
—
—
Table 6-9 gives timings for MOVE.L instructions accessing program-visible MAC
registers, along with other MOVE.L timings. Execution times for moving ACC or MACSR
contents into a destination location represent the best-case scenario when the store
instruction is executed and no load, MAC, or MSAC instructions are in the MAC execution
pipeline. In general, these store operations take only 1 cycle to execute, but if preceded
immediately by a load, MAC, or MSAC instruction, the MAC pipeline depth is exposed and
execution time is 3 cycles.
Table 6-9 and Table 6-11 apply only to the MAC. Table 6-15 lists EMAC execution times.
Table 6-9. MAC and Miscellaneous Move Execution Times
Effective Address
Opcode
<ea>
Rn
(An)
(An)+
-(An)
(d16,An)
(d8,An,Xi*SF)
(xxx).wl
#<xxx>
move.l
<ea>,ACC
1(0/0)
—
—
—
—
—
—
1(0/0)
move.l
<ea>,MACSR
6(0/0)
—
—
—
—
—
—
6(0/0)
move.l
<ea>,MASK
5(0/0)
—
—
—
—
—
—
5(0/0)
move.l
ACC,Rx
1(0/0)
—
—
—
—
—
—
—
move.l
MACSR,CCR
1(0/0)
—
—
—
—
—
—
—
move.l
MACSR,Rx
1(0/0)
—
—
—
—
—
—
—
move.l
MASK,Rx
1(0/0)
—
—
—
—
—
—
—
moveq
#imm,Dx
—
—
—
—
—
—
—
1(0/0)
mov3q
#imm,<ea>
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
—
mvs
<ea>,Dx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
mvz
<ea>,Dx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
6.5.2 Execution Timings—One-Operand Instructions
Table 6-10 shows standard timings for single-operand instructions.
6-24
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Execution Times
Table 6-10. One-Operand Instruction Execution Times
Effective Address
Freescale Semiconductor, Inc...
Opcode
<ea>
Rn
(An)
(An)+
-(An)
(d16,An)
(d8,An,Xi*SF)
(xxx).wl
#xxx
clr.b
<ea>
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
2(0/1)
1(0/1)
—
clr.w
<ea>
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
2(0/1)
1(0/1)
—
clr.l
<ea>
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
2(0/1)
1(0/1)
—
ext.w
Dx
1(0/0)
—
—
—
—
—
—
—
ext.l
Dx
1(0/0)
—
—
—
—
—
—
—
extb.l
Dx
1(0/0)
—
—
—
—
—
—
—
neg.l
Dx
1(0/0)
—
—
—
—
—
—
—
negx.l
Dx
1(0/0)
—
—
—
—
—
—
—
not.l
Dx
1(0/0)
—
—
—
—
—
—
—
sats.l
Dx
1(0/0)
—
—
—
—
—
—
—
scc
Dx
1(0/0)
—
—
—
—
—
—
—
swap
Dx
1(0/0)
—
—
—
—
—
—
—
tas
<ea>
1(1/1)
1(1/1)
1(1/1)
1(1/1)
1(1/1)
2(1/1)
1(1/1)
—
tst.b
<ea>
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
tst.w
<ea>
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
tst.l
<ea>
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
6.5.3 Execution Timings—Two-Operand Instructions
Table 6-11 shows standard timings for double operand instructions.
Table 6-11. Two-Operand Instruction Execution Times
Effective Address
Opcode
<ea>
Rn
(An)
(An)+
-(An)
(d16,An)
(d8,An,Xi*SF)
(xxx).wl
#<xxx>
add.l
<ea>,Rx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
add.l
Dy,<ea>
—
1(1/1)
1(1/1)
1(1/1)
1(1/1)
2(1/1)
1(1/1)
—
addi.l
#imm,Dx
1(0/0)
—
—
—
—
—
—
—
addq.l
#imm,<ea>
1(0/0)
1(1/1)
1(1/1)
1(1/1)
1(1/1)
2(1/1)
1(1/1)
—
addx.l
Dy,Dx
1(0/0)
—
—
—
—
—
—
—
and.l
<ea>,Rx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
and.l
Dy,<ea>
—
1(1/1)
1(1/1)
1(1/1)
1(1/1)
2(1/1)
1(1/1)
—
andi.l
#imm,Dx
1(0/0)
—
—
—
—
—
—
—
asl.l
<ea>,Dx
1(0/0)
—
—
—
—
—
—
1(0/0)
asr.l
<ea>,Dx
1(0/0)
—
—
—
—
—
—
1(0/0)
bchg
Dy,<ea>
2(0/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
—
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-25
Freescale Semiconductor, Inc.
Instruction Execution Times
Table 6-11. Two-Operand Instruction Execution Times (Continued)
Effective Address
Freescale Semiconductor, Inc...
Opcode
<ea>
Rn
(An)
(An)+
-(An)
(d16,An)
(d8,An,Xi*SF)
(xxx).wl
#<xxx>
bchg
#imm,<ea>
2(0/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
—
—
—
bclr
Dy,<ea>
2(0/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
—
bclr
#imm,<ea>
2(0/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
—
—
—
bset
Dy,<ea>
2(0/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
3(1/1)
2(1/1)
—
bset
#imm,<ea>
2(0/0)
2(1/1)
2(1/1)
2(1/1)
2(1/1)
—
—
—
btst
Dy,<ea>
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
—
btst
#imm,<ea>
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
—
—
—
cmp.b
<ea>,Rx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
cmp.w
<ea>,Rx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
cmp.l
<ea>,Rx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
cmpi.b
#imm,Dx
1(0/0)
—
—
—
—
—
—
—
cmpi.w
#imm,Dx
1(0/0)
—
—
—
—
—
—
—
cmpi.l
#imm,Dx
1(0/0)
—
—
—
—
—
—
—
divs.w
<ea>,Dx
20(0/0)
20(1/0)
20(1/0)
20(1/0)
20(1/0)
21(1/0)
20(1/0)
20(0/0)
divu.w
<ea>,Dx
20(0/0)
20(1/0)
20(1/0)
20(1/0)
20(1/0)
21(1/0)
20(1/0)
20(0/0)
divs.l
<ea>,Dx
35(0/0)
35(1/0)
35(1/0)
35(1/0)
35(1/0)
—
—
—
divu.l
<ea>,Dx
35(0/0)
35(1/0)
35(1/0)
35(1/0)
35(1/0)
—
—
—
eor.l
Dy,<ea>
1(0/0)
1(1/1)
1(1/1)
1(1/1)
1(1/1)
2(1/1)
1(1/1)
—
eori.l
#imm,Dx
1(0/0)
—
—
—
—
—
—
—
lea
<ea>,Ax
—
1(0/0)
—
—
1(0/0)
2(0/0)
1(0/0)
—
lsl.l
<ea>,Dx
1(0/0)
—
—
—
—
—
—
1(0/0)
lsr.l
<ea>,Dx
1(0/0)
—
—
—
—
—
—
1(0/0)
mac.w
Ry,Rx
1(0/0)
—
—
—
—
—
—
—
mac.l
Ry,Rx
3(0/0)
—
—
—
—
—
—
—
msac.w
Ry,Rx
1(0/0)
—
—
—
—
—
—
—
msac.l
Ry,Rx
3(0/0)
—
—
—
—
—
—
—
mac.w
Ry,Rx,ea,Rw
—
1(1/0)
1(1/0)
1(1/0)
1(1/0)
—
—
—
mac.l
Ry,Rx,ea,Rw
—
3(1/0)
3(1/0)
3(1/0)
3(1/0)
—
—
—
msac.w
Ry,Rx,ea,Rw
—
1(1/0)
1(1/0)
1(1/0)
1(1/0)
—
—
—
msac.l
Ry,Rx,ea,Rw
—
3(1/0)
3(1/0)
3(1/0)
3(1/0)
—
—
—
muls.w
<ea>,Dx
3(0/0)
3(1/0)
3(1/0)
3(1/0)
3(1/0)
4(1/0)
3(1/0)
3(0/0)
mulu.w
<ea>,Dx
3(0/0)
3(1/0)
3(1/0)
3(1/0)
3(1/0)
4(1/0)
3(1/0)
3(0/0)
muls.l
<ea>,Dx
5(0/0)
5(1/0)
5(1/0)
5(1/0)
5(1/0)
—
—
—
mulu.l
<ea>,Dx
5(0/0)
5(1/0)
5(1/0)
5(1/0)
5(1/0)
—
—
—
or.l
<ea>,Rx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
6-26
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Execution Times
Table 6-11. Two-Operand Instruction Execution Times (Continued)
Effective Address
Freescale Semiconductor, Inc...
Opcode
<ea>
Rn
(An)
(An)+
-(An)
(d16,An)
(d8,An,Xi*SF)
(xxx).wl
#<xxx>
or.l
Dy,<ea>
—
1(1/1)
1(1/1)
1(1/1)
1(1/1)
2(1/1)
1(1/1)
—
or.l
#imm,Dx
1(0/0)
—
—
—
—
—
—
—
rems.l
<ea>,Dx
35(0/0)
35(1/0)
35(1/0)
35(1/0)
35(1/0)
—
—
—
remu.l
<ea>,Dx
35(0/0)
35(1/0)
35(1/0)
35(1/0)
35(1/0)
—
—
—
sub.l
<ea>,Rx
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
1(0/0)
sub.l
Dy,<ea>
—
1(1/1)
1(1/1)
1(1/1)
1(1/1)
2(1/1)
1(1/1)
—
subi.l
#imm,Dx
1(0/0)
—
—
—
—
—
—
—
subq.l
#imm,<ea>
1(0/0)
1(1/1)
1(1/1)
1(1/1)
1(1/1)
2(1/1)
1(1/1)
—
subx.l
Dy,Dx
1(0/0)
—
—
—
—
—
—
—
6.5.4 Miscellaneous Instruction Execution Times
Table 6-12 lists timings for miscellaneous instructions.
Table 6-12. Miscellaneous Instruction Execution Times
Effective Address
Opcode
<ea>
Rn
(An)
(An)+
-(An)
(d16,An)
(d8,An,Xi*SF)
(xxx).wl
#<xxx>
—
—
—
—
—
—
cpushl
(Ax)
—
9(0/1)
intouch
(Ay)
—
19(1/0)
link.w
Ay,#imm
2(0/1)
—
—
—
—
—
—
—
move.w
CCR,Dx
1(0/0)
—
—
—
—
—
—
—
move.w
<ea>,CCR
1(0/0)
—
—
—
—
—
—
1(0/0)
move.w
SR,Dx
1(0/0)
—
—
—
—
—
—
—
move.w
<ea>,SR
4(0/0)
—
—
—
—
—
—
4(0/0)
Ry,Rc
20(0/1)
—
—
—
—
—
—
—
<ea>,&list
—
n(n/0)
—
—
n(n/0)
—
—
—
&list,<ea>
—
n(0/n)
—
—
n(0/n)
—
—
—
6(0/0)
—
—
—
—
—
—
—
2(0/1)3
1(0/1)
—
movec
movem.l
movem.l
1
nop
pea
<ea>
pulse
—
1(0/1)
—
—
1(0/1)2
1(0/0)
—
—
—
—
—
—
—
stop
#imm
—
—
—
—
—
—
—
6(0/0)4
trap
#imm
—
—
—
—
—
—
—
18(1/2)
tpf
1(0/0)
—
—
—
—
—
—
—
tpf.w
1(0/0)
—
—
—
—
—
—
—
tpf.l
1(0/0)
—
—
—
—
—
—
—
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-27
Freescale Semiconductor, Inc.
Instruction Execution Times
Table 6-12. Miscellaneous Instruction Execution Times (Continued)
Effective Address
Opcode
<ea>
Rn
(An)
(An)+
-(An)
(d16,An)
(d8,An,Xi*SF)
(xxx).wl
#<xxx>
1(1/0)
—
—
—
—
—
—
—
unlk
Ax
wddata.l
<ea>
—
1(1/0)
1(1/0)
1(1/0)
1(1/0)
2(1/0)
1(1/0)
—
wdebug.l
<ea>
—
3(2/0)
—
—
3(2/0)
—
—
—
1
n is the number of registers moved by the MOVEM opcode.
PEA execution times are the same for (d16,PC).
3 PEA execution times are the same for (d8,PC,Xi*SF).
4 The execution time for STOP is the time required until the processor begins sampling continuously for interrupts.
Freescale Semiconductor, Inc...
2
6.5.5 Branch Instruction Execution Times
Table 6-13 shows general branch instruction timing.
Table 6-13. General Branch Instruction Execution Times
Effective Address
Opcode
<ea>
bra
bsr
Rn
(An)
(An)+
-(An)
(d16,An)
(d8,An,Xi*SF)
(xxx).wl
#<xxx>
—
—
—
—
1(0/1)1
—
—
—
—
1(0/1)1
—
—
—
6(0/0)
1(0/0)1
—
—
—
—
jmp
<ea>
—
5(0/0)
—
—
5(0/0)1
jsr
<ea>
—
5(0/1)
—
—
5(0/1)
6(0/1)
1(0/1)1
—
—
—
15(2/0)
—
—
—
—
—
—
2(1/0)2
—
—
—
—
—
rte
rts
—
9(1/0)3
8(1/0)4
1
Assumes branch acceleration. Depending on the pipeline status, execution times may vary from 1 to 3 cycles.
If predicted correctly by the hardware return stack.
3 If mispredicted by the hardware return stack.
4 If not predicted by the hardware return stack.
2
Table 6-14 shows timing for Bcc instructions.
Table 6-14. Bcc Instruction Execution Times
Opcode
bcc
Branch Cache
Correctly Predicts
Taken
Prediction Table
Correctly Predicts
Taken
Predicted
Correctly as
Not Taken
0(0/0)
1(0/0)
1(0/0)
Predicted Incorrectly
8(0/0)
6.5.6 EMAC Instruction Execution Times
Table 6-15 specifies instruction execution times associated with the enhanced
multiply-accumulate (EMAC) execute engine.
6-28
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Instruction Execution Times
Table 6-15. EMAC Instruction Execution Times
Effective Address
Opcode
Freescale Semiconductor, Inc...
mac.l
<ea>y
Ry,Rx,ACCx
mac.l
Ry,Rx,<ea>,Rw,ACCx
mac.w
Ry,Rx,ACCx
Rn
(An)
(An)+
-(An)
(d16,An)
(d16,PC)
(d8,An,Xi*SF)
(d8,PC,Xi*SF)
xxx.wl
#xxx
1(0/0)
—
—
—
—
—
—
—
—
—
—
—
1(1/0)
1(1/0)
1(1/0)
1(1/0)1
1(0/0)
—
—
—
—
—
—
—
—
—
—
—
1(1/0)
1(1/0)
1(1/0)
1(1/0)1
<ea>y,ACCx
1(0/0)
—
—
—
—
—
—
1(0/0)
mov.l
ACCy,ACCx
1(0/0)
—
—
—
—
—
—
—
mov.l
<ea>y,MACSR
8(0/0)
—
—
—
—
—
—
8(0/0)
mov.l
<ea>y,MASK
7(0/0)
—
—
—
—
—
—
7(0/0)
mov.l
<ea>y,ACCext01
1(0/0)
—
—
—
—
—
—
1(0/0)
mov.l
<ea>y,ACCext23
1(0/0)
—
—
—
—
—
—
1(0/0)
mov.l
ACCx,<ea>x
1(0/0)2
—
—
—
—
—
—
—
mov.l
MACSR,<ea>x
1(0/0)
—
—
—
—
—
—
—
mov.l
MASK,<ea>x
1(0/0)
—
—
—
—
—
—
—
mov.l
ACCext01,<ea>x
1(0/0)
—
—
—
—
—
—
—
mov.l
ACCext23,<ea>x
1(0/0)
—
—
—
—
—
—
—
msac.l
Ry,Rx,ACCx
1(0/0)
—
—
—
—
—
—
—
—
—
—
mac.w
Ry,Rx,<ea>,Rw,ACCx
mov.l
—
1(1/0)
1(1/0)
1(1/0)
1(1/0)1
1(0/0)
—
—
—
—
—
—
—
—
1(1/0)
1(1/0)
1(1/0)
1(1/0)1
—
—
—
<ea>y,Dx
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
—
—
—
muls.w
<ea>y,Dx
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
5(1/0)
4(1/0)
4(0/0)
mulu.l
<ea>y,Dx
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
—
—
—
mulu.w
<ea>y,Dx
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
5(1/0)
4(1/0)
4(0/0)
msac.l
Ry,Rx,<ea>,Rw,ACCx
msac.w
Ry,Rx,ACCx
msac.w
Ry,Rx,<ea>,Rw,ACCx
muls.l
1
2
Effective address of (d16,PC) not supported.
Storing the accumulator requires 1 additional clock cycle when saturation is enabled, or fractional rounding is
performed (MACSR[7:4] = 1---, -11-, --11).
Execution times for moving the contents of the ACC, ACCext[01,23], MACSR, or MASK
into a destination location <ea>x in this table represent the best-case scenario when the
store is executed and no load, copy, MAC, or MSAC instructions are in the EMAC
execution pipeline. In general, these store operations require only a single cycle for
execution, but if preceded immediately by a load, copy, MAC, or MSAC instruction, the
depth of the EMAC pipeline is exposed and the execution time is 4 cycles.
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-29
Freescale Semiconductor, Inc.
Instruction Execution Times
6.5.7 FPU Instruction Execution Times
Table 6-16 specifies the instruction execution times associated with the FPU execute
engine.
Table 6-16. FPU Instruction Execution Times1,
2
Effective Address <ea>
Freescale Semiconductor, Inc...
Opcode
Format
FPn
Dn
(An)
(An)+
-(An)
(d16,An)
(d16,PC)
fabs
<ea>y,FPx
1(0/0)
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
fadd
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
fbcc
<label>
—
—
—
—
—
—
2(0/0) if correct,
9(0/0) if incorrect
fcmp
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
fdiv
<ea>y,FPx
23(0/0) 23(0/0)
23(1/0)
23(1/0) 23(1/0)
23(1/0)
23(1/0)
fint
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
fintrz
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
fmove
<ea>y,FPx
1(0/0)
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
fmove
FPy,<ea>x
—
2(0/1)
2(0/1)
2(0/1)
2(0/1)
2(0/1)
—
fmove
<ea>y,FP*R
—
6(0/0)
6(1/0)
6(1/0)
6(1/0)
6(1/0)
6(1/0)
fmove
FP*R,<ea>x
—
1(0/0)
1(0/1)
1(0/1)
1(0/1)
1(0/1)
—
fmovem3
<ea>y,#list
—
—
2n(2n/0)
—
—
2n(2n/0)
2n(2n/0)
fmovem3, 4
#list,<ea>x
—
—
1+2n(0/2n)
—
—
1+2n(0/2n)
—
fmul
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
fneg
<ea>y,FPx
1(0/0)
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
—
—
—
—
—
—
2(0/0)
fnop
frestore
<ea>y
—
—
6(4/0)
—
—
6(4/0)
6(4/0)
fsave
<ea>x
—
—
7(0/3)
—
—
7(0/3)
—
fsqrt
<ea>y,FPx
56(0/0) 56(0/0)
56(1/0)
56(1/0) 56(1/0)
56(1/0)
56(1/0)
fsub
<ea>y,FPx
4(0/0)
4(0/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
4(1/0)
ftst
<ea>y,FPx
1(0/0)
1(0/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1(1/0)
1
Add 1(1/0) for an external read operand of double-precision format for all instructions except FMOVEM,
and 1(0/1) for FMOVE FPy,<ea>x when the destination is double-precision.
2 If the external operand is an integer format (byte, word, or longword), there is a 4-cycle conversion time
that must be added to the basic execution time.
3 For FMOVEM, n refers to the number of registers being moved.
4 If any exceptions are enabled, the execution time for FMOVE FPy,<ea>x increases by 1 cycle. If the
BSUN exception is enabled, the execution time for FBcc increases by one cycle.
6-30
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 7
Exception Processing
This chapter describes CF4e exception processing, focusing on differences from previous
ColdFire versions. In particular, additional encodings have been added to the fault status
(FS) field in the exception stack frame to indicate exceptions related to translation
lookaside buffers (TLBs). This provides CF4e core designs with precise, recoverable faults
for all K-Bus references to support demand-paged memory accesses.
7.1 Overview
Exception processing for ColdFire processors is streamlined for performance. Differences
from previous ColdFire Family processors include the following:
•
•
An instruction restart model for translation (TLB miss) and access faults. This new
functionality extends the existing ColdFire access error fault vector and exception
stack frames.
Use of separate system stack pointers for user and supervisor modes.
Previous ColdFire processors use an instruction restart exception model but require
additional software support to recover from certain access errors.
Exception processing can be defined as the time from the detection of the fault condition
until the fetch of the first handler instruction has been initiated. It consists of the following
four major steps:
1. The processor makes an internal copy of the status register (SR) and then enters
supervisor mode by setting SR[S] and disabling trace mode by clearing SR[T]. The
occurrence of an interrupt exception also clears SR[M] and sets the interrupt priority
mask, SR[I] to the level of the current interrupt request.
2. The processor determines the exception vector number. For all faults except
interrupts, the processor bases this calculation on exception type. For interrupts, the
processor performs an interrupt acknowledge (IACK) bus cycle to obtain the vector
number from peripheral. The IACK cycle is mapped to a special acknowledge
address space with the interrupt level encoded in the address.
3. The processor saves the current context by creating an exception stack frame on the
system stack. As a result, the exception stack frame is created at a 0-modulo-4
address on top of the system stack pointed to by the supervisor stack pointer (SSP).
Chapter 7. Exception Processing
For More Information On This Product,
Go to: www.freescale.com
7-1
Freescale Semiconductor, Inc.
Overview
Freescale Semiconductor, Inc...
As shown in Figure 7-1, the CF4e processor uses the same fixed-length stack frame
as previous ColdFire Versions with additional fault status (FS) encodings to support
the MMU. In some exception types, the program counter (PC) in the exception stack
frame contains the address of the faulting instruction (fault); in others the PC
contains the next instruction to be executed (next). (Note that previous ColdFire
processors support a single stack pointer in the A7 address register.)
If the exception is caused by an FPU instruction, the PC contains the address of
either the next floating-point instruction (nextFP) if the exception is pre-instruction,
or the faulting instruction (fault) if the exception is post-instruction.
4. The processor acquires the address of the first instruction of the exception handler.
The instruction address is obtained by fetching a value from the exception table at
the address in the vector base register. The index into the table is calculated as
4 x vector_number. When the index value is generated, the vector table contents
determine the address of the first instruction of the desired handler. After the fetch
of the first opcode of the handler is initiated, exception processing terminates and
normal instruction processing continues in the handler.
The vector base register described in the ColdFire PRM, holds the base address of the
exception vector table in memory. The displacement of an exception vector is added to the
value in this register to access the vector table. VBR[19–0] are not implemented and are
assumed to be zero, forcing the vector table to be aligned on a 0-modulo-1-Mbyte
boundary.
ColdFire processors support a 1,024-byte vector table aligned on any 0-modulo-1 Mbyte
address boundary; see Table 7-1. The table contains 256 exception vectors, the first 64 of
which are defined by Motorola. The rest are user-defined interrupt vectors.
Table 7-1. Exception Vector Assignments
Vector Numbers Vector Offset (Hex)
7-2
Stacked Program Counter1
Assignment
0
000
—
Initial supervisor stack pointer
1
004
—
Initial program counter
2
008
Fault
Access error
3
00C
Fault
Address error
4
010
Fault
Illegal instruction
5
014
Fault
Divide by zero
6–7
018–01C
—
8
020
Fault
Privilege violation
9
024
Next
Trace
10
028
Fault
Unimplemented line-a opcode
11
02C
Fault
Unimplemented line-f opcode
12
030
Next
Non-PC breakpoint debug interrupt
Reserved
ColdFire CF4E Core User’s Guide
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Supervisor/User Stack Pointers (A7 and OTHER_A7)
Table 7-1. Exception Vector Assignments (Continued)
Freescale Semiconductor, Inc...
Vector Numbers Vector Offset (Hex)
1
Stacked Program Counter1
Assignment
13
034
Next
PC breakpoint debug interrupt
14
038
Fault
Format error
15
03C
Next
Uninitialized interrupt
16–23
040–05C
—
24
060
Next
Spurious interrupt
25–31
064–07C
Next
Level 1–7 autovectored interrupts
32–47
080–0BC
Next
Trap #0–15 instructions
48
0C0
Fault
Floating-point branch on unordered
condition
49
0C4
NextFP or Fault
Floating-point inexact result
50
0C8
NextFP
Floating-point divide-by-zero
51
0CC
NextFP or Fault
Floating-point underflow
52
0D0
NextFP or Fault
Floating-point operand error
53
0D4
NextFP or Fault
Floating-point overflow
54
0D8
NextFP or Fault
Floating-point input not-a-number (NAN)
55
0DC
NextFP or Fault
Floating-point input denormalized
number
56–60
0E0–0F0
—
61
0F4
Fault
62–63
0F8–0FC
—
64–255
100–3FC
Next
Reserved
Reserved
Unsupported instruction
Reserved
User-defined interrupts
‘Fault’ refers to the PC of the faulting instruction. ‘Next’ refers to the PC of the instruction immediately after the
faulting instruction. NextFP’ refers to the PC of the next floating-point instruction.
ColdFire processors inhibit sampling for interrupts during the first instruction of all
exception handlers. This allows any handler to effectively disable interrupts, if necessary,
by raising the interrupt mask level in the SR.
7.2 Supervisor/User Stack Pointers
(A7 and OTHER_A7)
The CF4e architecture supports two unique stack pointer (A7) registers—the supervisor
stack pointer (SSP) and the user stack pointer (USP). This support provides the required
isolation between operating modes as dictated by the virtual memory management scheme
provided by the memory management unit (MMU). Note that only the SSP is used during
creation of the exception stack frame.
The hardware implementation of these two programmable-visible 32-bit registers does not
uniquely identify one as the SSP and the other as the USP. Rather, the hardware uses one
Chapter 7. Exception Processing
For More Information On This Product,
Go to: www.freescale.com
7-3
Freescale Semiconductor, Inc.
Exception Stack Frame Definition
32-bit register as the currently-active A7 and the other as OTHER_A7. Thus, the register
contents are a function of the processor operating mode:
if SR[S] = 1
then
Freescale Semiconductor, Inc...
else
A7 = Supervisor Stack Pointer
other_A7 = User Stack Pointer
A7 = User Stack Pointer
other_A7 = Supervisor Stack Pointer
The BDM programming model supports reads and writes to A7 and OTHER_A7 directly.
It is the responsibility of the external development system to determine the mapping of (A7
and OTHER_A7) to the two program-visible definitions (SSP and USP), based on the
setting of SR[S]. This functionality is enabled by setting by the dual stack pointer enable
bit CACR[DSPE]. If this bit is cleared, only the stack pointer, A7 (defined for previous
ColdFire versions), is available. DSPE is zero at reset.
If DSPE is set, the appropriate stack pointer register (SSP or USP) is accessed as a function
of the processor’s operating mode. To support dual stack pointers, the following two
privileged MC680x0 instructions to load/store the USP are added to the ColdFire
instruction set architecture:
mov.l Ay,USP # move to USP: opcode = 0x4E6(0xxx)
mov.l USP,Ax # move from USP: opcode = 0x4E6(1xxx)
The address register number is encoded in the low-order 3 bits of the opcode.
7.3 Exception Stack Frame Definition
The first longword of the exception stack frame, Figure 7-1, holds the 16-bit format/vector
word (F/V) and 16-bit status register. The second holds the 32-bit program counter address.
31
A7→
+ 0x04
28
FORMAT
27
26
FS[3–2]
25
18
VEC
17
16
15
FS[1–0]
0
Status Register
Program Counter [31:0]
Figure 7-1. Exception Stack Frame
Table 7-2 describes F/V fields. FS encodings added to support the CF4e MMU are noted.
7-4
ColdFire CF4E Core User’s Guide
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Processor Exceptions
Table 7-2. Format/Vector Word
Bits
Field
Description
Freescale Semiconductor, Inc...
31–28 FORMAT Format field. Written with a value of {4,5,6,7} by the processor indicating a 2-longword frame
format. FORMAT records any longword stack pointer misalignment when the exception occurred.
1
27–26
FS[3–2]
25–18
VEC
17–16
FS[1–0]
A7 at Time of Exception, Bits[1:0]
A7 at First Instruction of Handler
FORMAT
00
Original A7—8
0100
01
Original A7—9
0101
10
Original A7—10
0110
11
Original A7—11
0111
Fault status. Defined for access and address errors and for interrupted debug service routines.
0000 Not an access or address error nor an interrupted debug service routine
0001 Reserved
0010 Interrupt during a debug service routine for faults other than access errors. 1
0011 Reserved
0100 Error (for example, protection fault) on instruction fetch
0101 TLB miss on opword of instruction fetch (New in CF4e)
0110 TLB miss on extension word of instruction fetch (New in CF4e)
0111 IFP access error while executing in emulator mode (New in CF4e)
1000 Error on data write
1001 Error on attempted write to write-protected space
1010 TLB miss on data write (New in CF4e)
1011 Reserved
1100 Error on data read
1101 Attempted read, read-modify-write of protected space (New in CF4e)
1110 TLB miss on data read, or read-modify-write (New in CF4e)
1111 OEP access error while executing in emulator mode (New in CF4e)
Vector number. Defines the exception type. It is calculated by the processor for internal faults and
is supplied by the peripheral for interrupts. See Table 7-1.
See bits 27–26.
This generally refers to taking an I/O interrupt during a debug service routine but also applies to other fault
types. If an access error occurs during a debug service routine, FS is set to 0111 if it is due to an instruction
fetch or to 1111 for a data access. This applies only to access errors with the MMU present. If an access error
occurs without an MMU, FS is set to 0010.
7.4 Processor Exceptions
Table 7-3 describes CF4e exceptions. Note that if a ColdFire processor encounters any fault
while processing another fault, it immediately halts execution with a catastrophic
fault-on-fault condition. A reset is required to force the processor to exit this halted state.
Chapter 7. Exception Processing
For More Information On This Product,
Go to: www.freescale.com
7-5
Freescale Semiconductor, Inc.
Processor Exceptions
Freescale Semiconductor, Inc...
Table 7-3. Exceptions
Type
Description
Access error
If the MMU is disabled, access errors are reported only in conjunction with an attempted store to
write-protected memory. Thus, access errors associated with instruction fetch or operand read
accesses are not possible. The Version 4 processor, unlike the Version 2 and 3 processors, updates
the condition code register if a write-protect error occurs during a CLR or MOV3Q operation to
memory.
K-Bus accesses that fault (that is, terminated with a K-Bus transfer error acknowledge) generate an
access error exception. MMU TLB misses and access violations use the same fault. If the MMU is
enabled, all TLB misses and protection violations generate an access error exception. To determine if
a fault is due to a TLB miss or another type of access error, new FS encodings (described in
Table 7-2) signal TLB misses on the following:
• Instruction fetch
• Instruction extension fetch
• Data read
• Data write
Address error
An address error is caused by an attempted execution transferring control to an odd instruction
address (that is, if bit 0 of the target address is set), an attempted use of a word-sized index register
(Xi.w) or by an attempted execution of an instruction with a full-format indexed addressing mode.
If an address error occurs on a JSR instruction, the Version 4 processor first pushes the return
address onto the stack and then calculates the target address.
On Version 2 and 3 processors, the target address is calculated then the return address is pushed on
stack. If an address error occurs on an RTS instruction, the Version 4 processor preserves the
original return PC and writes the exception stack frame above this value. On Version 2 and 3
processors, the faulting return PC is overwritten by the address error stack frame.
Illegal
instruction
The scope of illegal instruction detection is implementation-specific across the generations of
ColdFire cores. For the CF4e core, the complete 16-bit opcode is decoded and this exception is
generated if execution of an unsupported instruction is attempted. Additionally, attempting to execute
an illegal line A or line F opcode generates unique exception types: vectors 10 and 11, respectively.
ColdFire processors do not provide illegal instruction detection on extension words of any instruction,
including MOVEC. Attempting to execute an instruction with an illegal extension word causes
undefined results.
Divide-by-zero
Privilege
violation
Attempting to divide by zero causes an exception (vector 5, offset = 0x014).
Caused by attempted execution of a supervisor mode instruction while in user mode. The ColdFire
Programmer’s Reference Manual lists supervisor- and user-mode instructions.
Trace exception Trace mode, which allows instruction-by-instruction tracing, is enabled by setting SR[T].
If SR[T] is set, instruction completion (for all but the STOP instruction) signals a trace exception.The
STOP instruction has the following effects:
1 The instruction before the STOP executes and then generates a trace exception. In the exception
stack frame, the PC points to the STOP opcode.
2 When the trace handler is exited, the STOP instruction is executed, loading the SR with the
immediate operand from the instruction.
3 The processor then generates a trace exception. The PC in the exception stack frame points to the
instruction after STOP, and the SR reflects the value loaded in the previous step.
If the processor is not in trace mode and executes a STOP instruction where the immediate operand
sets SR[T], hardware loads the SR and generates a trace exception. The PC in the exception stack
frame points to the instruction after STOP, and the SR reflects the value loaded in step 2. Note that
because ColdFire processors do not support hardware stacking of multiple exceptions, it is the
responsibility of the operating system to check for trace mode after processing other exception types.
For example, when a TRAP instruction executes in trace mode, the processor initiates the TRAP
exception and passes control to the corresponding handler. If the system requires a trace exception,
the TRAP exception handler must check for this condition (SR[15] in the exception stack frame set)
and pass control to the trace handler before returning from the original exception.
7-6
ColdFire CF4E Core User’s Guide
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Processor Exceptions
Table 7-3. Exceptions (Continued)
Type
Description
Unimplemented A line-a opcode results when bits 15–12 of the opword are 1010. This exception is generated by the
line-a opcode attempted execution of an undefined line-a opcode.
Freescale Semiconductor, Inc...
Unimplemented A line-f opcode results when bits 15–12 of the opword are 1111. This exception is generated under
line-f opcode
the following conditions:
• When attempting to execute an undefined line-f opcode.
• When attempting to execute an FPU instruction when the FPU has been disabled in the CACR.
Debug interrupt
The debug interrupt exception is caused by a hardware breakpoint register trigger. Rather than
generating an IACK cycle, the processor internally calculates the vector number (12 or 13, depending
on the type of breakpoint trigger). Additionally, SR[M,I] are unaffected by the interrupt.
Separate exception vectors are provided for PC breakpoints and for address/data breakpoints. In the
case of a two-level trigger, the last breakpoint determines the vector. The two unique entries occur
when a PC breakpoint generates the 0x034 vector. In case of a two-level trigger, the last breakpoint
event determines the vector. See Chapter 11, “Debug Support.”
Format error
When an RTE instruction executes, the processor first examines the 4-bit format field to validate the
frame type. For a ColdFire processor, attempted execution of an RTE where the format is not equal to
{4, 5, 6, 7} generates a format error. The exception stack frame for the format error is created without
disturbing the original exception frame and the stacked PC points to RTE. The selection of the format
value provides limited debug support for porting code from M68000 applications. On M68000 Family
processors, the SR was at the top of the stack. Bit 30 of the longword addressed by the system stack
pointer is typically zero. Attempting an RTE using this old format generates a format error on a
ColdFire processor. If the format field defines a valid type, the processor does the following:
1 Reloads the SR operand.
2 Fetches the second longword operand.
3 Adjusts the stack pointer by adding the format value to the auto-incremented address after the first
longword fetch.
4 Transfers control to the instruction address defined by the second longword operand in the stack
frame.
When the processor executes a FRESTORE instruction, if the restored FPU state frame contains a
non-supported value, execution is aborted and a format error exception is generated.
Trap
Executing a TRAP instruction always forces an exception and is useful for implementing system calls.
The trap instruction may be used to change from user to supervisor mode.
Interrupt
exception
Interrupt exception processing, with interrupt recognition and vector fetching, includes uninitialized
and spurious interrupts as well as those where the requesting device supplies the 8-bit interrupt
vector. Autovectoring can optionally be configured through the system interface module (SIM).
Reset exception Asserting the reset input signal (RSTI) causes a reset exception, which has the highest exception
priority and provides for system initialization and recovery from catastrophic failure. When assertion
of RSTI is recognized, current processing is aborted and cannot be recovered. The reset exception
places the processor in supervisor mode by setting SR[S] and disables tracing by clearing SR[T]. It
clears SR[M] and sets SR[I] to the highest level (0b111, priority level 7). Next, VBR is cleared.
Configuration registers controlling operation of all processor-local memories are invalidated, disabling
the memories.
Note: Implementation-specific supervisor registers are also affected at reset.
After RSTI is negated, the processor waits 16 cycles before beginning the reset exception process.
During this time, certain events are sampled, including the assertion of the debug breakpoint signal. If
the processor is not halted, it initiates the reset exception by performing two longword read bus
cycles. The longword at address 0 is loaded into the stack pointer and the longword at address 4 is
loaded into the PC. After the initial instruction is fetched from memory, program execution begins at
the address in the PC. If an access error or address error occurs before the first instruction executes,
the processor enters a fault-on-fault halted state.
Unsupported
instruction
exception
If the CF4e attempts to execute a valid instruction but the required optional hardware module is not
present in the OEP, a non-supported instruction exception is generated (vector 0x61). Control is then
passed to an exception handler that can then process the opcode as required by the system.
Chapter 7. Exception Processing
For More Information On This Product,
Go to: www.freescale.com
7-7
Freescale Semiconductor, Inc.
Precise Faults
7.5 Precise Faults
To support a demand-paged virtual memory environment, all memory references require
precise, recoverable faults. The ColdFire instruction restart mechanism ensures that a
faulted instruction restarts from the beginning of execution; that is, no internal state
information is saved when an exception occurs and none is restored when the handler ends.
Given the PC address defined in the exception stack frame, the processor reestablishes
program execution by transferring control to the given location as part of the RTE (return
from exception) instruction.
Freescale Semiconductor, Inc...
The instruction restart recovery model requires program-visible register changes made
during execution to be undone if that instruction subsequently faults.
The Version 4 (and later) OEP structure naturally supports this concept for most
instructions; program-visible registers are updated only in the final OEP stage when fault
collection is complete. If any type of exception occurs, pending register updates are
discarded.
For V4 cores and later, most single-cycle instructions already support precise faults and
instruction restart. Some complex instructions do not. Consider the following
memory-to-memory move:
mov.l
(Ay)+,(Ax)+
# copy 4 bytes from source to destination
On a Version 4 processor, this instruction takes one cycle to read the source operand (Ay)
and one to write the data into Ax. Both the source and destination address pointers are
updated as part of execution. Table 7-4 lists the operations performed in execute stage (EX).
Table 7-4. OEP EX Cycle Operations
EX Cycle
Operations
1
Read source operand from memory @ (Ay), update Ay, new Ay = old Ay + 4
2
Write operand into destination memory @ (Ax), update Ax, new Ax = old Ax + 4, update CCR
A fault detected with the destination memory write is reported during the second cycle. At
this point, operations performed in the first cycle are complete, so if the destination write
takes any type of access error, Ay is updated. After the access error handler executes and
the faulting instruction restarts, the processor’s operation is incorrect because the source
address register has an incorrect (post-incremented) value.
To recover the original state of the programming model for all instructions, the CF4eCpu
adds the needed hardware to support full register recovery. This hardware allows
program-visible registers to be restored to their original state for multi-cycle instructions so
that the instruction restart mechanism is supported. Memory-to-memory moves and move
multiple loads are representative of the complex instructions needing the special recovery
support.
The other major pipeline change affects the IFP. The IFP and OEP are decoupled by a FIFO
7-8
ColdFire CF4E Core User’s Guide
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Precise Faults
Freescale Semiconductor, Inc...
instruction buffer. In the V4 IFP, each buffer entry includes 48 bits of instruction data
fetched from memory and 64 bits of early decode and branch prediction information. This
datapath is expanded slightly to include IFP fault status information. Thus, every IFP access
can be tagged in case an instruction fetch terminates with an error acknowledge.
NOTE:
For access errors signaled on instruction prefetches, an access
error exception is generated only if instruction execution is
attempted. If an instruction fetch access error exception is
generated and the FS field indicates the fault occurred on an
extension word, it may be necessary for the exception PC to be
rounded-up to the next page address to determine the faulting
instruction fetch address.
Chapter 7. Exception Processing
For More Information On This Product,
Go to: www.freescale.com
7-9
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Precise Faults
7-10
ColdFire CF4E Core User’s Guide
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 8
Local Memory
This chapter describes the implementation of the ColdFire CF4e local memory
specification. The V4 local memory specification implements a Harvard memory
architecture, including separate caches, ROM, RAM, and the necessary buses and registers
to support instruction and data memory. This chapter consists of the following major
sections.
•
•
•
Section 8.5, “SRAM Overview,” describes the on-chip static RAM (SRAM)
implementation. It covers general operations, configuration, and initialization. It
also provides information and examples showing how to minimize power
consumption when using the SRAM.
Section 8.6, “ROM Overview,” describes the on-chip ROM implementation. It
covers general operations, configuration, and initialization. It also provides
information and examples showing how to minimize power consumption when
using the ROM.
Section 8.7, “Cache Overview,” describes the cache implementation, including
organization, configuration, and coherency. It describes cache operations and how
the caches interface with other memory structures.
8.1 Local Memory Overview
Figure 8-1 is a generic block diagram of a CF4e core interface.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-1
Freescale Semiconductor, Inc.
Local Memory Overview
E-Bus
S-Bus
System
Integration
Module
Slave
Module
Slave
Module
Master
Module
M-Bus
CF4e Core Kmem
EMAC, FPU
Freescale Semiconductor, Inc...
CF4e Core
Instruction K-Bus
Data K-Bus
CF4e
CPU
ICACHE
Ctrl
Debug
KRAM0
Ctrl
DCACHE
Ctrl
ICACHE
Tag/Data
Arrays
K2M
KROM0
Ctrl
KRAM1
Ctrl
KRAM0
Mem
Array
KRAM1
Ctrl
KROM0
Mem
Array
Data K-Bus
ICACHE
Tag/Data
Arrays
KRAM1
Mem
Array
KROM1
Mem
Array
Figure 8-1. Generic CF4e Block Diagram
The system buses have the following hierarchy:
•
•
•
•
K-Bus—Instruction and operand; processor core and dedicated on-chip memories
M-Bus—Internal multi-master with centralized arbitration
S-Bus—Slave module bus controlled by the system integration module (SIM)
E-Bus—External interface bus
To maximize processor performance, RAM, ROM, and cache controllers reside on the
high-speed local bus. These controllers support a range of memory sizes, such that when
coupled with the use of compiled memory arrays, provide system designers with the ability
to configure the implementation with the optimum amount of local memory for a given
application.
The KROM controllers are each associated with a ROMBAR control register (KROM0
with ROMBAR0 and KROM1with ROMBAR1). A CF4e design may have 0–2 KROM
controllers. The controllers independently support array sizes of 512 bytes, and 1, 2, 4, 8,
16 or 32 Kbytes. These arrays can be configured by a bit in the appropriate ROMBAR
control register to be on either the instruction K-Bus or the data K-Bus. This allows the
KROM controllers and their associated arrays to be moved from one K-Bus to the other.
Input configuration signals provide the ability to automatically load the ROMBAR control
registers at reset so the read-only memories can be used as boot devices.
8-2
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Local Memory Overview
The KROM controller has an integrated BIST controller to test the KROM array.
Figure 8-2 shows the cross-bar connections involving the KRAM and KROM memories.
Cache
Controller I
I JADDR
I KADDR
Freescale Semiconductor, Inc...
I KRDATA
I-KBus
KRAM0
Controller
CF4Cpu
KROM0
Controller
K2M
MADDR
+ Debug
+ Misalign
MWDATA
KRAM1
Controller
KROM1
Controller
MRDATA
O JADDR
O KADDR
O KWDATA
O KRDATA
O-KBus
Cache
Controller 0
Figure 8-2. Local Memory Block Diagram Showing Cache, KRAM, and KROM
Controllers
In each controller, the interface to the memory arrays is defined as a synchronous one. As
shown in the following figure, the input registers for capturing the reference address and
write data are specified to be internal to the memory module:
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-3
Freescale Semiconductor, Inc.
Local Memory Overview
Core
Freescale Semiconductor, Inc...
ADDR
DBI
RAM
Array
DBO
Synchronous Memory
Figure 8-3. ColdFire Core Synchronous Memory Interface
The rectangular boxes with the double-bar at the top represent rising-edge, register storage
elements, and the following signals are defined: ADDR is the reference address, DBI is the
data bus input, and DBO is the data bus output.
As shown in Figure 8-4, all input signals have a setup and hold time with respect to the
rising edge of the clock. All outputs transition after a propagation delay from the rising edge
of the clock (clk).
su
hld
clk
su = setup time to rising edge of clk
Inputs
hld = hold time after rising edge of clk
tpd = propagation time after rising edge
of clk
Outputs
tpd
Figure 8-4. Synchronous Memory Timing Diagram
8-4
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Two-Stage Pipelined Local Bus (K-Bus)
Memory outputs are held valid until the next rising edge of the clock.
A generic port list and synchronous memory functionality are shown in Figure 8-5.
.
Variable
A
CSB
Variable
Variable
DBI
DBO
Freescale Semiconductor, Inc...
RWB
CLK
Figure 8-5. Synchronous Memory Interface Block Diagram
In Figure 8-5, the memory address width is a function of the capacity of the local memory,
and the data bus widths (DBI and DBO) are a function of the type of synchronous memory
(cache, RAM, or ROM). The port names for the memory block are defined as follows: A is
the reference address, CSB is an active-low chip select, DBI is the data bus input, RWB is
the read/write control (read = 1, write = 0), CLK is the processor’s clock, and DBO is the
data bus output.
The corresponding functional truth table is shown in Table 8-1.
Table 8-1. Synchronous Memory Truth Table (Sampled at Positive Edge of CLK)
CSB
RWB
Operation
1
x
Idle (minimum power)
0
1
Read memory, DBO = Memory[A]
0
0
Write memory, Memory[A] = DBI
8.2 Two-Stage Pipelined Local Bus (K-Bus)
In the pipelined K-Bus design, consider a read operation. The first stage (KC1) is dedicated
to the actual memory access, while the second stage (KC2) supports data transmission back
to the processor. This structure provides an optimum time balance of the basic functions
associated with a K-Bus reference, because it effectively provides an entire machine cycle
for the memory array access.
The pipelined operation actually begins with a J cycle, where part of the reference address
and certain control signals are sent from the processor to the K-Bus memory controllers in
the cycle immediately preceding the KC1 stage. This transmission is necessary to allow
controllers/arrays to have a local registered copy of the time-critical portion of the reference
address.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-5
Freescale Semiconductor, Inc.
Two-Stage Pipelined Local Bus (K-Bus)
The KC1 access begins with the reference address contained in a register within the
memory arrays. The memory controller performs the actual access and registers the data
output for a read operation in a local data register in the controller. Thus, the entire
operation is contained within the controller and the compiled memory array. During the
KC2 stage, the read operand is selected from the appropriate source (cache, RAM, ROM,
or the K-to-M-Bus controller) and routed back onto the K-Bus where it eventually is
registered by the processor or debug module.
For operand write references, the data is sourced onto the K-Bus during the KC1 cycle, but
the actual memory array update is delayed until the KC2 cycle, so that the appropriate
memory unit can be identified.
Freescale Semiconductor, Inc...
To summarize, the basic pipelined K-Bus operations are shown below:
•
•
Read
— J: Send the low-order portion of the reference address plus certain control signals
to the memories
— KC1: Broadcast to all memories which may contain data, perform read access
— KC2: K2M selects appropriate memory as source, and routes data back to CPU
Write
— J: Send the low-order portion of the reference address plus certain control signals
to the memories
— KC1: K2M signals the appropriate memory as destination, so it can capture data
— KC2: Destination memory performs the actual write access
Given that the write strategy performs the operation during KC2, there are cases where
consecutive write/read accesses may incur a 1-cycle K-Bus pipeline stall to handle the
read-after-write hazard.
For cache misses or accesses that are not mapped into a K-Bus memory, the access proceeds
to the KC2 stage where it is stalled as an M-Bus transfer is initiated. As the M-Bus access
completes, the KC2 stall is negated and K-Bus operation is terminated.
The following block diagram presents the cache functions within the two-stage pipelined
K-Bus structure:
8-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Interactions between Local Memory Modules
JADDR, J CONTROL
J
Tag
Array
Data
Array
KADDR, KWDATA, K-Bus Control
Freescale Semiconductor, Inc...
Access Type
Access Mode
Memory Unit Select
KC1
Compare
Cache
Hit
Cache
Read
Data
M-Bus Control
MRDATA
Fill
Buffer
Cache Busy
Cache Hit
KC2
To K2M
Ultimately to KRDATA
Figure 8-6. Version 4 Cache Block Diagram
8.3 Interactions between Local Memory Modules
Depending on configuration information, instruction fetches and data read accesses may be
sent simultaneously to the SRAM, ROM, and cache controllers. This approach is required
because the controllers are memory-mapped devices and the hit/miss determination is made
concurrently with the read data access. Power dissipation can be minimized by configuring
the ROM and SRAM base address registers (ROMBARs and RAMBARs) to mask unused
address spaces whenever possible.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-7
Freescale Semiconductor, Inc.
Local Memory Connection Specification
If the access address is mapped into the region defined by the SRAM (and this region is not
masked), it provides the data back to the processor and any cache or ROM data is discarded.
If the access address does not hit the SRAM, but is mapped into the region defined by the
ROM (and this region is not masked), the ROM provides the data back to the processor and
any cache data is discarded. Accesses from the SRAM and ROM modules are never cached.
The complete definition of the processor’s local bus priority scheme for read references is
as follows:
Freescale Semiconductor, Inc...
The internal processor memory hierarchy uses the following priority:
1.
2.
3.
4.
SRAM (highest)
ROM
Cache (if space is defined as cacheable)
External access (lowest)
8.4 Local Memory Connection Specification
This section describes the how memory devices are connected and how memory sizes are
configured.
8.4.1 K-Bus Memory Array Signal Connections
The processor supports the following six K-Bus processor local memory controllers:
•
The KRAM0 and KRAM1 (SRAM) memory controllers
•
the KROM0 and KROM1 (ROM) memory controllers
•
Instruction cache controller
•
Data cache controller.
Each controller supports a range of memory arrays, which must be synchronous SRAM or
ROM structures that use the same clock as the core. The following information details the
range of memory arrays supported and the necessary array connections.
8.4.1.1 KRAM Information
KRAM controllers use synchronous SRAM memory arrays external to the core. These
synchronous SRAMs must use the same clock as the core.
The KRAM controllers support a 32-bit array width (with byte write control) and array
sizes of 512 bytes and 1, 2, 4, 8, 16, 32, and 64 Kbytes. The controller uses the
kram{0,1}size[3:0] inputs to determine the connected array size, as shown in Table 8-2.
8-8
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Table 8-2. KRAM Size
Freescale Semiconductor, Inc...
kram{0,1}size[3:0] Total Size
Configuration
0000
0 bytes
KRAM{0,1} disabled
0001
512 bytes
128 x 4 bytes
0010
1 Kbytes
256 X 4 bytes
0011
2 Kbytes
512 X 4 bytes
0100
4 Kbytes
1024 X 4 bytes
0101
8 Kbytes
2048 X 4 bytes
0110
16 Kbytes
4096 X 4 bytes
0111
32 Kbytes
8192 X 4 bytes
1000
64 Kbytes
16384 X 4 bytes
1001–1111
RFU
RFU
The signals in Table 8-3 connect a KRAM controller to its SRAM array.
Table 8-3. KRAM Memory Array Connections
Direction/Size Signal Name
Definition
output[15:2]
kram0addr
kram1addr
KRAM0 14-bit address
KRAM1 14-bit address
output[31:0]
kram0di
kram1di
KRAM0 32-bit data in
KRAM1 32-bit data in
output[3:0]
kram0web
kram1web
KRAM0 byte write enables (active-low)
KRAM1 byte write enables (active-low)
output
kram0csb
kram1csb
KRAM0 chip select (active-low)
KRAM1 chip select (active-low)
input[31:0]
kram0do
kram1do
KRAM0 32-bit data out
KRAM1 32-bit data out
KRAM memories are 32 bits wide. The kram0di[31:0] and kram1di[31:0] signals are the
array input data and kram0do[31:0] and kram1do[31:0] are the array output data for all
KRAM sizes.
The kram0addr[15:2] and kram1addr[15:2] signals are the array addresses. The KRAM
controller provides enough address bits for the largest-supported array sizes.
The 2 low-order address bits, which are not sourced from the KRAM controller to the
KRAM arrays directly, select the bytes in the 32-bit KRAM data array interface. For read
operations, the KRAM controller always fetches 32 bits and the controller sends this
information to the 32-bit K-Bus. The K-Bus data requester is responsible for using only the
bytes selected. The KRAM controller uses byte write enables to select bytes for write
operations.
Table 8-4 shows how the array address is connected for all supported KRAM array sizes.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-9
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Table 8-4. KRAM0/KRAM1 Array Address Connection
Freescale Semiconductor, Inc...
kram{0,1}size[3:0] Total Size
Configuration
Array Address
Unused Address
0000
0 bytes
KRAM0 disabled
KRAM1 disabled
—
kram0addr[15:2]
kram1addr[15:2]
0001
512 bytes
128 X 4 bytes
128 X 4 bytes
kram0addr[8:2]
kram1addr[8:2]
kram0addr[15:9]
kram1addr[15:9]
0010
1 Kbytes
256 X 4 bytes
256 X 4 bytes
kram0addr[9:2]
kram1addr[9:2]
kram0addr[15:10]
kram1addr[15:10]
0011
2 Kbytes
512 X 4 bytes
512 X 4 bytes
kram0addr[10:2]
kram1addr[10:2]
kram0addr[15:11]
kram1addr[15:11]
0100
4 Kbytes
1024 X 4 bytes
1024 X 4 bytes
kram0addr[11:2]
kram1addr[11:2]
kram0addr[15:12]
kram1addr[15:12]
0101
8 Kbytes
2048 X 4 bytes
2048 X 4 bytes
kram0addr[12:2]
kram1addr[12:2]
kram0addr[15:13]
kram1addr[15:13]
0110
16 Kbytes
4096 X 4 bytes
4096 X 4 bytes
kram0addr[13:2]
kram1addr[13:2]
kram0addr[15:14]
kram1addr[15:14]
0111
32 Kbytes
8192 X 4 bytes
8192 X 4 bytes
kram0addr[14:2]
kram1addr[14:2]
kram0addr[15]
kram1addr[15]
1000
64 Kbytes
16384 X 4 bytes
16384 X 4 bytes
kram0addr[15:2]
kram1addr[15:2]
—
—
1001–1111
RFU
RFU
RFU
RFU
The kram0csb and kram1csb signals are the chip selects for KRAM0 and KRAM1. If a
chip-select signal is negated, all other KRAM signals are don’t cares. If multiple array
instances are used to implement the KRAM array configuration, the signal must be used as
the chip enable for all instances.
The kram0web[3:0] and kram1web[3:0] signals are the byte write enables. The byte write
enables correspond to the 4 data bytes in the 32-bit KRAM, as shown in Table 8-5.
Table 8-5. KRAM0/KRAM1 Byte Write Enables
Byte Write Enable Array Data in Bits Controlled
kram0web[0]
kram1web[0]
kram0di[31:24]
kram1di[31:24]
kram0web[1]
kram1web[1]
kram0di[23:16]
kram1di[23:16]
kram0web[2]
kram1web[2]
kram0di[15:8]
kram1di[15:8]
kram0web[3]
kram1web[3]
kram0di[7:0]
kram1di[7:0]
8.4.1.2 KROM Controller Information
The KROM controllers uses synchronous ROM memory arrays external to the core. These
arrays must use the same clock as the core.
8-10
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Local Memory Connection Specification
The KROM controllers support an array width of 32 bits and array sizes of 512 bytes and
1, 2, 4, 8, 16, 32, and 64 Kbytes. The controller uses the krom{0,1}size[3:0] inputs to
determine the connected array size as shown in Table 8-6.
Freescale Semiconductor, Inc...
Table 8-6. KRAM0/KRAM1 Size
krom{0,1}size[3:0]
Total Size
Configuration
0000
0 bytes
KROM{0,1} disabled
0001
512 bytes
128 x 4 bytes
0010
1 Kbytes
256 X 4 bytes
0011
2 Kbytes
512 X 4 bytes
0100
4 Kbytes
1024 X 4 bytes
0101
8 Kbytes
2048 X 4 bytes
0110
16 Kbytes
4096 X 4 bytes
0111
32 Kbytes
8192 X 4 bytes
1000
64 Kbytes
16384 X 4 bytes
1001–1111
RFU
RFU
The signals in Table 8-7 connect a KRAM controller to its ROM array.
Table 8-7. KROM{0,1} Memory Array Connections
Direction/Size
Signal Name
Definition
output[15:2]
krom0addr
krom1addr
KROM0 15 bit address
KROM1 15 bit address
output
krom0csb
krom1csb
KROM0 chip select (active-low)
KROM1 chip select (active-low)
input[31:0]
krom0do
krom1do
KROM0 32 bit data out
KROM1 32 bit data out
KROM memories are 32 bits wide. The krom0do[31:0] and krom1do[31:0] signals are the
array output data for all sizes of KRAM0 and KRAM1.
The krom0addr[15:2] and krom1addr[15:2] signals are the array addresses. Each KROM
controller provides enough address bits for the largest supported array sizes.
The 2 low-order address bits, which are not sourced from the KROM controller to the
KROM arrays directly, select the bytes within the 32-bit KROM data array interface. For
read operations, the KROM controller always fetches 32 bits and sends this information to
the 32-bit K-Bus. The K-Bus data requester is responsible for using only the bytes selected.
The array address is connected as shown in Table 8-8 for all supported KROM array sizes.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-11
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Table 8-8. KROM Array Address Connection
Freescale Semiconductor, Inc...
krom{0,1}size[3:0] Total Size
Configuration
Array Address
Unused Address
0000
0 bytes
KROM0 disabled
KROM1 disabled
—
krom0addr[15:2]
krom1addr[15:2]
0001
512 bytes
128 X 4 bytes
128 X 4 bytes
krom0addr[8:2]
krom1addr[8:2]
krom0addr[15:9]
krom1addr[15:9]
0010
1 Kbytes
256 X 4 bytes
256 X 4 bytes
krom0addr[9:2]
krom1addr[9:2]
krom0addr[15:10]
krom1addr[15:10]
0011
2 Kbytes
512 X 4 bytes
512 X 4 bytes
krom0addr[10:2] krom0addr[15:11]
krom1addr[10:2] krom1addr[15:11]
0100
4 Kbytes
1024 X 4 bytes
1024 X 4 bytes
krom0addr[11:2] krom0addr[15:12]
krom1addr[11:2] krom1addr[15:12]
0101
8 Kbytes
2048 X 4 bytes
2048 X 4 bytes
krom0addr[12:2] krom0addr[15:13]
krom1addr[12:2] krom1addr[15:13]
0110
16 Kbytes
4096 X 4 bytes
4096 X 4 bytes
krom0addr[13:2] krom0addr[15:14]
krom1addr[13:2] krom1addr[15:14]
0111
32 Kbytes
8192 X 4 bytes
8192 X 4 bytes
krom0addr[14:2]
krom1addr[14:2]
krom0addr[15]
krom1addr[15]
1000
64 Kbytes
16384 X 4 bytes
16384 X 4 bytes
krom0addr[15:2]
krom1addr[15:2]
—
—
1001–1111
RFU
RFU
RFU
RFU
The krom0csb and krom1csb signals are chip-selects for the KROMs. When a signal is
inactive, all other corresponding KROM signals are don’t cares. If multiple array instances
are used to implement the KROM array configuration, the signal must be used as the chip
enable for all instances.
8.4.1.3 Instruction Cache Information
The instruction cache controller uses synchronous SRAM memory arrays external to the
core for its memory array needs. These arrays must use the same clock as the core.
The instruction cache design contains a non-blocking, 4-way set-associative instruction
cache with a 16-byte line. Cache size is configurable with 2, 4, 8, 16, or 32 Kbyte capacities
available. The cache improves system performance by providing low-latency data to the
core instruction fetch pipeline, decoupling processor performance from system memory
response speeds and providing increased bus availability for alternate bus masters.
The non-blocking cache services read hits from the processor while a fill (caused by a cache
allocation) is in progress.
The instruction cache is virtual address indexed and physical address tagged (see
Chapter 10, “Memory Management Unit (MMU),” for detailed information). If the address
matches one of the cache entries, the access hits in the cache. For a read operation, the cache
supplies the data to the processor. If the access does not match one of the cache entries
8-12
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Local Memory Connection Specification
(misses in the cache), the K2M (K-Bus to M-Bus) controller performs a transfer on the
M-Bus.
Freescale Semiconductor, Inc...
The four-way set-associative cache is organized as four levels (ways) of 32, 64, 128, 256,
or 512 sets (for 2-, 4-, 8-, 16-, or 32-Kbyte cache sizes, respectively), with each line
containing 16 bytes (four longwords) of storage.
Table 8-9 shows the various cache set counts, line counts, address bits, tag bits, etc., for
each available cache size. For all caches sizes, a 16-byte line is used (i.e., column G, line
size, is always 16 bytes and the in-line address is always A3–A0) and the level of
associativity is always 4 (that is, column F, number of levels, is always 4). The number of
sets (column E) is related to the number of bits in the set index by the expression number
of sets equals 2n where n is the number of bits in the set index. Any address bits A31–A0
not used in the set index or the in-line address are used for the tag address (column B).
Finally, the cache size can be calculated as: cache size = number of sets x number of levels
x line size.
Table 8-9. Instruction Cache Sizes and Configurations
A
B
C
D
E
F
G
Cache Size Tag Address Set Index In-line Address Number of Sets Number of Levels Line Size
2 Kbytes
A31–A09
A08–A04
A03–A00
32
4
16 bytes
4 Kbytes
A31–A10
A09–A04
A03–A00
64
4
16 bytes
8 Kbytes
A31–A11
A10–A04
A03–A00
128
4
16 bytes
16 Kbytes
A31–A12
A11–A04
A03–A00
256
4
16 bytes
32 Kbytes
A31–A13
A12–A04
A03–A00
512
4
16 bytes
Address bits A[12:4] (as needed for the selected cache size) provide an index to select a set.
Levels are selected according to the rules of set association.
Each line consists of an address tag (the upper 19–23 bits of the addresses needed for the
selected cache size), two status bits, and four longwords of data. The two status bits consist
of a valid bit (V) and a dirty bit (D) for the line. The dirty bit indicates the line was been
written or modified. The instruction cache never sets this bit during normal operation but
this bit must be implemented for correct array test operation. Address bits A3 and A2 select
the longword within the line.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-13
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Level 0
Level 1
Level 2
Level 3
•
•
•
•
•
•
•
•
•
•
•
•
Set 0
Set 1
Line
Set 510
Set 511
Cache Line Format
Freescale Semiconductor, Inc...
TAG
V D
LW0
LW1
LW2
LW3
Where:
TAG—19-bit address tag
V—Valid bit for line
D—Dirty bit for line
LWn—Longword n (32-bit) data entry
Figure 8-7. Cache Organization and Line Format (32-Kbyte Cache Size Shown)
The controller uses the icsize[3:0] input to determine the connected array sizes, as shown
in Table 8-10.
Table 8-10. Instruction Cache Size
icsize[3:0]
Total Size Data Array Configuration Data Array Total Depth Tag Array
Configuration Tag Array
0000
0 bytes
Instruction cache disabled
0 rows
Instruction cache disabled
0001
0 bytes
Instruction cache disabled
0 rows
Instruction cache disabled
0010
0 bytes
Instruction cache disabled
0 rows
Instruction cache disabled
0011
2 Kbytes
4 x 128 X 4 bytes
32 rows
32 X 25 bits
0100
4 Kbytes
4 x 256 X 4 bytes
64 rows
64 X 24 bits
0101
8 Kbytes
4 x 512 X 4 bytes
128 rows
128 X 23 bits
0110
16 Kbytes
4 x 1024 X 4 bytes
256 rows
256 X 22 bits
0111
32 Kbytes
4 x 2048 X 4 bytes
512 rows
2512 X 21 bits
1000–1111
RFU
RFU
RFU
RFU
The signals in Table 8-11 connect the instruction cache controller to its SRAM array.
Table 8-11. Instruction Cache Memory Array Connections
Direction/Size
Signal Name
Output
nsientb
Next-state instruction cache tag enable
Output
nsiwrttb
Next-state instruction cache tag write
Output
nsiwlvt
[3:0]
Next-state instruction cache tag write level
Output
nsirowst
[9:0]
Next-state instruction cache tag address
Output
nsiaddrt
[31:9]
Next-state instruction cache tag data
Output
nsisw
8-14
Bus Width
Definition
Next-state instruction cache tag written bit
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Freescale Semiconductor, Inc...
Table 8-11. Instruction Cache Memory Array Connections (Continued)
Direction/Size
Signal Name
Bus Width
Output
nsisv
Next-state instruction cache tag valid bit
Output
nsiendb
Next-state instruction cache data enable
Output
nsiwrtdb
[3:0]
Next-state instruction cache data write level
Output
nsiwtbyted
[3:0]
Next-state instruction cache data byte write
Output
nsirowsd
[11:0]
Next-state instruction cache data address
Output
nsicwrdata
[31:0]
Next-state instruction cache write data
Input
ictag3do
[31:9]
Instruction cache level 3 tag data output
Input
icw3do
Instruction cache level 3 written bit output
Input
icv3do
Instruction cache level 3 valid bit output
Input
ictag2do
Input
icw2do
Instruction cache level 2 written bit output
Input
icv2do
Instruction cache level 2 valid bit output
Input
ictag1do
Input
icw1do
Instruction cache level 1 written bit output
Input
icv1do
Instruction cache level 1 valid bit output
Input
ictag0do
Input
icw0do
Instruction cache level 0 written bit output
Input
icv0do
Instruction cache level 0 valid bit output
Input
iclvl3do
[31:0]
Instruction cache level 3 data output
Input
iclvl2do
[31:0]
Instruction cache level 2 data output
Input
iclvl1do
[31:0]
Instruction cache level 1 data output
Input
iclvl0do
[31:0]
Instruction cache level 0 data output
[31:9]
[31:9]
[31:9]
Definition
Instruction cache level 2 tag data output
Instruction cache level 1 tag data output
Instruction cache level 0 tag data output
All instruction cache data arrays are 32 bits wide. The nsicwrdata[31:0] signals are the data
array input data and iclvl0do[31:0], iclvl1do[31:0], clvl2do[31:0], and iclvl3do[31:0] are
the data array output data for all sizes and levels of instruction cache.
All instruction cache tag arrays are 24 bits wide. The nsiaddrt[31:9] signals are the tag array
input data and ictag3do[31:9], ictag2do[31:9], ictag1do[31:9], and ictag0do[31:9] are the
tag array output data for all sizes of instruction cache.
The nsirowst[9:0] signals the array address for the tag arrays. The nsirowsd[11:0] signals
are the array address for the data arrays. The instruction cache controller provides enough
address bits for the largest supported cache sizes.
The array address is connected as shown in Table 8-12 for all supported instruction cache
data array sizes:
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-15
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Freescale Semiconductor, Inc...
Table 8-12. Instruction Cache Data Array Address Connection
icsize[3:0]
Total Size
Configuration
Array Address
Unused Address
0000
0 bytes
Instruction cache disabled
—
nsirowsd[11:0]
0001
0 bytes
Instruction cache disabled
—
nsirowsd[11:0]
0010
0 bytes
Instruction cache disabled
—
nsirowsd[11:0]
0011
2 Kbytes
4 x 128 X 4 bytes
nsirowsd[6:0]
nsirowsd[11:7]
0100
4 Kbytes
4 x 256 X 4 bytes
nsirowsd[7:0]
nsirowsd[11:8]
0101
8 Kbytes
4 x 512 X 4 bytes
nsirowsd[8:0]
nsirowsd[11:9]
0110
16 Kbytes
4 x 512 X 4 bytes
nsirowsd[9:0]
nsirowsd[11:10]
0111
32 Kbytes
4 x 1024 X 4 bytes
nsirowsd[10:0]
nsirowsd[11]
1000–1111
RFU
RFU
RFU
RFU
The tag array has one entry for every four entries in the data array. Therefore, the tag array
does not use ichadd[3:2]. The array address is connected as shown in Table 8-13 for all
supported instruction cache tag array sizes.
Table 8-13. Instruction Cache Tag Array Address Connection
icsize[3:0]
Total Size
Configuration
Array Address Unused Address
0000
0 rows
Instruction cache disabled
—
nsirowst[9:0]
0001
0 rows
Instruction cache disabled
—
nsirowst[9:0]
0010
0 rows
Instruction cache disabled
—
nsirowst[9:0]
0011
32 rows
32 X 25 bits
nsirowst[4:0]
nsirowst[9:5]
0100
256 rows
256 X 24 bits
nsirowst[5:0]
nsirowst[9:6]
0101
512 rows
512 X 23 bits
nsirowst[6:0]
nsirowst[9:7]
0110
1024 rows
1024 X 22 bits
nsirowst[7:0]
nsirowst[9:8]
0111
2048 rows
2048 X 21 bits
nsirowst[8:0]
nsirowst[9]
1000–1111
RFU
RFU
RFU
RFU
The instruction cache controller provides enough tag array write data bits for the smallest
supported cache size. Each larger cache size needs one fewer (low order) tag bit.
Unnecessary tag bits may be implemented in the tag array (the cache controller always
writes and reads them as 0) or they may be tied to 0 at the core boundary. The tag array write
data is connected as shown in Table 8-14 for all supported cache tag array sizes.
Table 8-14. Instruction Cache Tag Array Write Data Connection
icsize[3:0]
Total Size
Configuration
0000
0 rows
Instruction cache disabled
—
nsiaddrt[31:9]:
nsisw:nsisv
0001
0 rows
Instruction cache disabled
—
nsiaddrt[31:9]
nsisw:nsisv
8-16
Array Write Data Unused Write Data (Must Be Tied to 0)
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Freescale Semiconductor, Inc...
Table 8-14. Instruction Cache Tag Array Write Data Connection (Continued)
icsize[3:0]
Total Size
Configuration
Array Write Data Unused Write Data (Must Be Tied to 0)
0010
0 rows
Instruction cache disabled
—
nsiaddrt[31:9]
nsisw:nsisv
0011
32 rows
32 X 25 bits
nsiaddrt[31:9]:
nsisw:nsisv
----
0100
256 rows
256 X 24 bits
nsiaddrt[31:10]:
nsisw:nsisv
nsiaddrt[9]
0101
512 rows
512 X 23 bits
nsiaddrt[31:11]:
nsisw:nsisv
nsiaddrt[10:9]
0110
1024 rows
1024 X 22 bits
nsiaddrt[31:12]:
nsisw:nsisv
nsiaddrt[11:9]
0111
2048 rows
2048 X 21 bits
nsirowst[31:13]:
nsisw:nsisv
nsirowst[12:9]
1000–1111
RFU
RFU
RFU
RFU
The nsoendb signal is the chip select for the data array of the instruction cache. If nsoendb
is negated, all other instruction cache data array signals are don’t cares. If multiple array
instances are used to implement the instruction cache data array configuration, these signals
must be used as the chip enable for all instances.
The nsoentb signal is the chip select for the instruction cache tag array. If this signal is
negated, all other instruction cache tag array signals are don’t cares. If multiple array
instances are used to implement the instruction cache tag array configuration, this signal
must be used as the chip enable for all instances.
The nsowtbyted[3:0] signals are the write enables for the data array; nsowrttb is the write
enable for the tag array.
8.4.1.4 Data Cache Information
The data cache controller uses synchronous SRAM memory arrays external to the core for
its memory array needs. These synchronous SRAM memories must use the same clock as
the core.
The data cache design contains a non-blocking, 4-way set-associative data cache with a
16-byte line size. Cache size is configurable with 2-, 4-, 8-, 16-, or 32-Kbyte capacities
available. The cache improves system performance by providing low-latency data to the
operand fetch pipeline, decoupling processor performance from system memory response
speeds, and providing increased bus availability for alternate bus masters.
The nonblocking cache services read hits from the processor while a fill (caused by a cache
allocation) is in progress.
The data cache is virtual address indexed and physical address tagged (see Chapter 10,
“Memory Management Unit (MMU),” for detailed information). If the address matches one
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-17
Freescale Semiconductor, Inc.
Local Memory Connection Specification
of the cache entries, the access hits in the cache. For a read operation, the cache supplies
the data to the processor. If the access does not match one of the cache entries (misses in
the cache) the K2M (K-Bus to M-Bus) controller performs a transfer on the M-Bus.
Freescale Semiconductor, Inc...
The four-way set-associative cache is organized as four levels (ways) of 32, 64, 128, 256,
or 512 sets (for 2-, 4-, 8-, 16-, or 32-Kbyte cache sizes, respectively), with each line
containing 16 bytes (four longwords) of storage.
Table 8-15 shows the various cache set counts, line counts, address bits, tag bits, etc. for
each available cache size. For all caches sizes, a 16-byte line size is used (i.e., column G,
line size, is always 16 bytes and the in-line address is always A3–A0) and the level of
associativity is always 4 (i.e., column F, number of levels, is always 4). The number of sets
(column E) is related to the number of bits in the set index by the expression number of sets
equals 2n where n is the number of bits in the set index. Any address bits A31–A0 not used
in the set index or the in-line address are used for the tag address (column B). Finally, the
cache size can be calculated as: cache size = number of sets x number of levels x line size.
Table 8-15. Data Cache Sizes and Configurations
A
B
C
D
E
F
G
Cache Size Tag Address Set Index In-line Address Number of Sets Number of Levels Line Size
2 Kbytes
A31–A09
A08–A04
A03–A00
32
4
16 bytes
4 Kbytes
A31–A10
A09–A04
A03–A00
64
4
16 bytes
8 Kbytes
A31–A11
A10–A04
A03–A00
128
4
16 bytes
16 Kbytes
A31–A12
A11–A04
A03–A00
256
4
16 bytes
32 Kbytes
A31–A13
A12–A04
A03–A00
512
4
16 bytes
Address bits A[12:4] (as needed for the selected cache size) provide an index to select a set.
Levels are selected according to the rules of set association.
Each line consists of an address tag (the upper 19–23 bits of the addresses needed for the
selected cache size), two status bits, and four longwords of data. The two status bits consist
of a valid bit (V) and a dirty bit (D) for the line. The dirty bit indicates the line was been
written or modified by an operand reference. Address bits A3 and A2 select the longword
within the line.
8-18
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Level 0
Level 1
Level 2
Level 3
•
•
•
•
•
•
•
•
•
•
•
•
Set 0
Set 1
Line
Set 510
Set 511
Cache Line Format
Freescale Semiconductor, Inc...
TAG
V D
LW0
LW1
LW2
LW3
Where:
TAG—19-bit address tag
V—Valid bit for line
D—Dirty bit for line
LWn—Longword n (32-bit) data entry
Figure 8-8. Cache Organization and Line Format (32 Kbyte Cache Size shown)
The controller uses ocsize[2:0] to determine the connected array sizes as shown in
Table 8-16.
Table 8-16. Data Cache Size
ocsize[3:0] Total Size Data Array Configuration Data Array Total Depth Tag Array Configuration Tag Array
0000
0 bytes
Data cache disabled
0 rows
Data cache disabled
0001
0 bytes
Data cache disabled
0 rows
Data cache disabled
0010
0 bytes
Data cache disabled
0 rows
Data cache disabled
0011
2 Kbytes
4 x 128 X 4 bytes
32 rows
32 X 24 bits
0100
4 Kbytes
4 x 256 X 4 bytes
64 rows
64 X 24 bits
0101
8 Kbytes
4 x 512 X 4 bytes
128 rows
128 X 24 bits
0110
16 Kbytes
4 x 1024 X 4 bytes
256 rows
256 X 24 bits
0111
32 Kbytes
4 x 2048 X 4 bytes
512 rows
2512 X 24 bits
1000–1111
RFU
RFU
RFU
RFU
The signals in Table 8-17 connect the data cache controller to its SRAM array:
Table 8-17. Data Cache Memory Array Connections
Direction/Size Signal Name Bus Width
Definition
Output
nsoentb
Next-state O-Cache tag enable
Output
nsowrttb
Next-state O-Cache tag write
Output
nsowlvt
[3:0]
Next-state O-Cache tag write level
Output
nsorowst
[9:0]
Next-state O-Cache tag address
Output
nsoaddrt
[31:9]
Next-state O-Cache tag data
Output
nsosw
Next-state O-Cache tag written bit
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-19
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Table 8-17. Data Cache Memory Array Connections (Continued)
Freescale Semiconductor, Inc...
Direction/Size Signal Name Bus Width
Definition
Output
nsosv
Next-state O-Cache tag valid bit
Output
nsoendb
Next-state O-Cache data enable
Output
nsowrtdb
[3:0]
Next-state O-Cache data write level
Output
nsowtbyted
[3:0]
Next-state O-Cache data byte write
Output
nsorowsd
[11:0]
Next-state O-Cache data address
Output
nsocwrdata
[31:0]
Next-state O-Cache write data
Input
octag3do
[31:9]
O-Cache level 3 tag data output
Input
ocw3do
O-Cache level 3 written bit output
Input
ocv3do
O-Cache level 3 valid bit output
Input
octag2do
Input
ocw2do
O-Cache level 2 written bit output
Input
ocv2do
O-Cache level 2 valid bit output
Input
octag1do
Input
ocw1do
O-Cache level 1 written bit output
Input
ocv1do
O-Cache level 1 valid bit output
Input
octag0do
Input
ocw0do
O-Cache level 0 written bit output
Input
ocv0do
O-Cache level 0 valid bit output
Input
oclvl3do
[31:0]
O-Cache level 3 data output
Input
oclvl2do
[31:0]
O-Cache level 2 data output
Input
oclvl1do
[31:0]
O-Cache level 1 data output
Input
oclvl0do
[31:0]
O-Cache level 0 data output
[31:9]
[31:9]
[31:9]
O-Cache level 2 tag data output
O-Cache level 1 tag data output
O-Cache level 0 tag data output
All data cache data arrays are 32 bits wide. The nsocwrdata[31:0] signals are the data array
input data and oclvl0do[31:0], oclvl1do[31:0], oclvl2do[31:0], and oclvl3do[31:0] are the
data array output data for all sizes and levels of data cache.
All data cache tag arrays are 24 bits wide. The nsoaddrt[31:9] signals are the tag array input
data and octag3do[31:9], octag2do[31:9], octag1do[31:9], and octag0do[31:9] are the tag
array output data for all sizes of data cache.
The nsorowst[9:0] signals are the array address for the tag arrays. The nsorowsd[11:0]
signals are the array address for the data arrays. The data cache controller provides enough
address bits for the largest supported cache sizes.
The array address is connected as shown in Table 8-18 for all supported data cache data
array sizes.
8-20
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Local Memory Connection Specification
Table 8-18. Data Cache Data Array Address Connection
Freescale Semiconductor, Inc...
ocsize[3:0] Total Size
Configuration
Array Address Unused Address
0000
0 bytes
Data cache disabled
—
nsorowsd[11:0]
0001
0 bytes
Data cache disabled
—
nsorowsd[11:0]
0010
0 bytes
Data cache disabled
—
nsorowsd[11:0]
0011
2 Kbytes
4 x 128 X 4 bytes
nsorowsd[6:0]
nsorowsd[11:7]
0100
4 Kbytes
4 x 256 X 4 bytes
nsorowsd[7:0]
nsorowsd[11:8]
0101
8 Kbytes
4 x 512 X 4 bytes
nsorowsd[8:0]
nsorowsd[11:9]
0110
16 Kbytes
4 x 512 X 4 bytes
nsorowsd[9:0]
nsorowsd[11:10]
0111
32 Kbytes
4 x 1024 X 4 bytes
nsorowsd[10:0]
nsorowsd[11]
1000–1111
RFU
RFU
RFU
RFU
The tag array has one entry for every four data array entries; therefore, the tag array does
not use ochadd[3:2]. The array address is connected as shown in Table 8-19 for all
supported data cache tag array sizes.
Table 8-19. Data Cache Tag Array Address Connection
ocsize[3:0]
Total Size
Configuration
Array Address Unused Address
0000
0 rows
Data cache disabled
—
nsorowst[9:0]
0001
0 rows
Data cache disabled
—
nsorowst[9:0]
0010
0 rows
Data cache disabled
—
nsorowst[9:0]
0011
32 rows
32 X 24 bits
nsorowst[4:0]
nsorowst[9:5]
0100
256 rows
256 X 24 bits
nsorowst[5:0]
nsorowst[9:6]
0101
512 rows
512 X 24 bits
nsorowst[6:0]
nsorowst[9:7]
0110
1024 rows
1024 X 24 bits
nsorowst[7:0]
nsorowst[9:8]
0111
2048 rows
2048 X 24 bits
nsorowst[8:0]
nsorowst[9]
1000–1111
RFU
RFU
RFU
RFU
The data cache controller provides enough tag array write data bits for the smallest
supported cache size. Each larger cache size needs one fewer (low order) tag bit.
Unnecessary tag bits may be implemented in the tag array (the cache controller always
writes and reads them as 0) or they may be tied to 0 at the core boundary. The tag array write
data is connected as shown in Table 8-20 for supported data cache tag array sizes.
Table 8-20. Data Cache Tag Array Write Data Connection
ocsize[3:0]
Total Size
Configuration
Array Write Data
Unused Write Data (Must Be Tied to 0)
0000
0 rows
Data cache disabled
—
nsoaddrt[31:9]:
nsosw:nsosv
0001
0 rows
Data cache disabled
—
nsoaddrt[31:9]
nsosw:nsosv
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-21
Freescale Semiconductor, Inc.
SRAM Overview
Freescale Semiconductor, Inc...
Table 8-20. Data Cache Tag Array Write Data Connection
ocsize[3:0]
Total Size
Configuration
Array Write Data
Unused Write Data (Must Be Tied to 0)
0010
0 rows
Data cache disabled
—
nsoaddrt[31:9]
nsosw:nsosv
0011
32 rows
32 X 25 bits
nsoaddrt[31:9]:
nsosw:nsosv
----
0100
256 rows
256 X 24 bits
nsoaddrt[31:10]:
nsosw:nsosv
nsoaddrt[9]
0101
512 rows
512 X 23 bits
nsoaddrt[31:11]:
nsosw:nsosv
nsoaddrt[10:9]
0110
1024 rows
1024 X 22 bits
nsoaddrt[31:12]:
nsosw:nsosv
nsoaddrt[11:9]
0111
2048 rows
2048 X 21 bits
nsorowst[31:13]:
nsosw:nsosv
nsorowst[12:9]
1000–1111
RFU
RFU
RFU
RFU
The nsoendb signal is the chip select for the data array of the data cache. If this signal is
negated, all other data cache data array signals are don’t cares. If multiple array instances
are used to implement the data cache data array configuration, this signal must be used as
the chip enable for all instances.
The nsoentb signal is the chip select for the instruction cache tag array. If this signal is
negated, all other data cache tag array signals are don’t cares. If multiple array instances are
used to implement the data cache tag array configuration, this signal must be used as the
chip enable for all instances.
The nsowtbyted[3:0] signals are write enables for the data array; nsowrttb is the write
enable for the tag array.
8.5 SRAM Overview
On-chip SRAM modules connect to the instruction and data buses, as shown in Figure 8-1
and Figure 8-2 internal bus, and they provide pipelined, single-cycle access to devices
memory-mapped to them. Memory can be independently mapped to any properly aligned
location in the 4-Gbyte address space and configured to respond to either instruction or data
accesses.
Time-critical functions can be mapped into instruction memory. The system stack or other
heavily referenced data can be mapped into data memory.
The following summarizes features of the CF4e SRAM implementation:
•
•
8-22
Single-cycle throughput; when the pipeline is full, one access can occur per clock
cycle.
Physical location on the processor’s high-speed local buses with a user-programmed
connection to the internal instruction or data bus
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
SRAM Overview
•
•
•
Memory location programmable on any aligned boundary; typically boundaries are
0-modulo-size aligned.
Byte, word, longword, and line-sized access capabilities
The RAM base address registers (RAMBAR0 and RAMBAR1) define the logical
base address, attributes, and access types for the two SRAM modules.
8.5.1 SRAM Operation
Freescale Semiconductor, Inc...
An SRAM module provides a general-purpose memory block that the ColdFire processor
can access with single-cycle throughput.
The memory block’s location can be specified to any word-aligned address in the 4-Gbyte
address space by RAMBARn[BA], described in Section 8.5.2.1, “SRAM Base Address
Registers (RAMBAR0/RAMBAR1).” Such memory is ideal for storing critical code or data
structures or for use as the system stack. When mapped as an instruction memory, the
SRAM can service instruction fetches generated by the processor core. When mapped as a
data memory, the SRAM can service operand accesses from the processor and
memory-referencing debug module commands.
The Version 4 ColdFire processor core implements a Harvard memory architecture. Each
SRAM module may be logically connected to either the processor’s internal instruction or
data bus. This logical connection is controlled by a configuration bit in the RAM base
address registers (RAMBAR0 and RAMBAR1).
If an instruction fetch is mapped into the region defined by the SRAM, the SRAM sources
the data to the processor and any data fetched from the ROM or cache is discarded. If it
misses in the SRAM and hits in the ROM, the ROM data is used and the data fetched from
the ROM or cache is discarded. If it misses in the SRAM and hits in the ROM, the ROM
data is used and the data fetched from the cache is discarded.”
Likewise, if a data access is mapped into the region defined by the SRAM, the SRAM
services the access and the cache is not affected. Accesses from SRAM and ROM modules
are never cached, and debug-initiated references are treated as data accesses.
Note also that on-chip DMAs cannot access SRAMs. The on-chip system configuration
allows concurrent core and DMA execution, where the core can reference code or data from
the internal SRAMs or caches while performing a DMA transfer.
8.5.2 SRAM Programming Model
The SRAM programming model consists of RAMBAR0 and RAMBAR1.
8.5.2.1 SRAM Base Address Registers (RAMBAR0/RAMBAR1)
The SRAM modules are configured through the RAMBARs, shown in Figure 8-9.
•
Each RAMBAR holds the base address of the SRAM.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-23
Freescale Semiconductor, Inc.
SRAM Overview
•
•
•
•
The MOVEC instruction provides write-only access to this register from the
processor.
Each RAMBAR can be read or written from the debug module in a similar manner.
All undefined RAMBAR bits are reserved. These bits are ignored during writes to
the RAMBAR and return zeros when read from the debug module.
The valid bits, RAMBARn[V], are cleared at reset, disabling the SRAM modules.
All other bits are unaffected.
31
9
Field
BA
Freescale Semiconductor, Inc...
Reset
8
WP
7
6
5
4
3
2
1
D/I — C/I SC SD UC UD
—
R/W
0
V
0
W for CPU; R/W for debug
Address
CPU space + 0xC04 (RAMBAR0), CPU space + 0xC05 (RAMBAR1)
Figure 8-9. SRAM Base Address Registers (RAMBARn)
RAMBARn fields are described in detail in Table 8-21.
Table 8-21. RAMBARn Field Description
Bits
Name
Description
31–9
BA
Base address. Defines the SRAM module’s 0-modulo-size base address corresponding to the size
of space that each module occupies. SRAM alignment is implementation specific. See Table 8-22.
8
WP
Write protect. Controls read/write properties of the SRAM.
0 Allows read and write accesses to the SRAM module
1 Allows only read accesses to the SRAM module. Any attempted write reference generates an
access error exception to the ColdFire processor core.
7
D/I
Data/instruction bus. Indicates whether SRAM is connected to the internal data or instruction bus.
0 Data bus
1 Instruction bus
6
—
Reserved, should be cleared.
5–1
C/I,
SC,
SD,
UC,
UD
0
V
8-24
Address space masks (ASn). These fields allow certain types of accesses to be masked, or
inhibited from accessing the SRAM module. These bits are useful for power management as
described in Section 8.5.5, “Programming RAMBARs for Power Management.” In particular, C/I is
typically set.
The address space mask bits are follows:
C/I = CPU space/interrupt acknowledge cycle mask. Note that C/I must be set if BA = 0.
SC = Supervisor code address space mask
SD = Supervisor data address space mask
UC = User code address space mask
UD = User data address space mask
For each ASn bit:
0 An access to the SRAM module can occur for this address space
1 Disable this address space from the SRAM module. If a reference using this address space is
made, it is inhibited from accessing the SRAM module and is processed like any other
non-SRAM reference.
Valid. Enables/disables the SRAM module. V is cleared at reset.
0 RAMBAR contents are not valid.
1 RAMBAR contents are valid.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
SRAM Overview
The KRAM controller uses the base address bits it needs and ignores lower-order bits to
support the configured KRAM size, as shown in Table 8-22.
Freescale Semiconductor, Inc...
Table 8-22. KRAM Size Configuration
KRAM Size Input Vector1
KRAM Size
Active Base Address Bits
Ignored Base Address Bits
0000
0 Bytes
None
31:9
0001
512 Bytes
31:9
None
0010
1 Kbytes
31:10
9
0011
2 Kbytes
31:11
10:9
0100
4 Kbytes
31:12
11:9
0101
8 Kbytes
31:13
12:9
0110
16 Kbytes
31:14
13:9
0111
32 Kbytes
31:15
14:9
1000
64 Kbytes
31:16
15:9
1001–1111
1
Reserved
The KRAM size input vector is determined by the core input signals kram0size[3:0] and kram1size[3:0].
Therefore, the KRAMs sizes are independently selected.
The mapping of a given access into the RAM uses the following algorithm to determine if
the access hits in the memory:
if (RAMBAR[0] = 1)
if (((access = instructionFetch) & (RAMBAR[7] = 1)) |
((access = dataReference)
& (RAMBAR[7] = 0)))
if (requested address[31:n] = RAMBAR[31:n]
if (ASn of the requested type = 0)
Access is mapped to the RAM module
if (access = read)
Read the RAM and return the data
if (access = write)
if (RAMBAR[8] = 0)
Write the data into the RAM
else Signal a write-protect access error
ASn refers to the five address space mask bits: C/I, SC, SD, UC, and UD.
8.5.3 SRAM Initialization
After a hardware reset, the contents of each SRAM module are undefined. The valid bits,
RAMBARn[V], are cleared, disabling the SRAM modules. If the SRAM requires
initialization with instructions or data, the following steps should be performed:
1. Load RAMBARn with bit 7 = 0, mapping the SRAM module to the desired location.
Clearing RAMBARn[7] logically connects the SRAM module to the processor’s
data bus.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-25
Freescale Semiconductor, Inc.
SRAM Overview
Freescale Semiconductor, Inc...
2. Read the source data and write it to the SRAM. Various instructions support this
function, including memory-to-memory move instructions and the move multiple
instruction (MOVEM). MOVEM is optimized to generate line-sized burst fetches on
line-aligned addresses, so it generally provides maximum performance.
3. After the data is loaded into the SRAM, it may be appropriate to revise the
RAMBAR attribute bits, including the write-protect and address space mask fields.
If the SRAM contains instructions, RAMBAR[D/I] must be set to logically connect
the memory to the processor’s internal instruction bus.
Remember that the SRAM cannot be accessed by on-chip DMAs. The on-chip system
configuration allows concurrent core and DMA execution where the core can execute code
out of internal SRAM or cache during DMA access, where the core can access instructions
and operands from the internal RAM, ROM, or cache while the DMA is operational.
The ColdFire processor or an external emulator using the debug module can perform these
initialization functions.
8.5.4 SRAM Initialization Code
The code segment below initializes the SRAM using RAMBAR0. The code sets the base
address of the SRAM at 0x2000_0000 and then initializes the 2-Kbyte block to zeros.
RAMBASE
RAMVALID
move.l
movec.l
EQU
0x20000000
EQU
0x00000035
#RAMBASE+RAMVALID,D0
D0, RAMBAR0
;set this variable to 0x20000000
;load RAMBASE + valid bit into D0
;load RAMBAR0 and enable SRAM
The following loop initializes the entire SRAM to zero:
lea.l
move.l
RAMBASE,A0
#512,D0
;load pointer to SRAM
;load loop counter into D0
(A0)+
#1,D0
SRAM_INIT_LOOP
;clear 4 bytes of SRAM
;decrement loop counter
;exit if done; else continue looping
SRAM_INIT_LOOP:
clr.l
subq.l
bne.b
The following function copies the number of bytesToMove from the source (*src) to the
processor’s local RAM at an offset relative to the SRAM base address defined by
destinationOffset. The bytesToMove must be a multiple of 16. For best performance, source
and destination SRAM addresses should be line-aligned (0-modulo-16).
; copyToCpuRam (*src, destinationOffset, bytesToMove)
RAMBASE
RAMVALID
lea.l
movem.l
8-26
EQU
EQU
0x20000000
0x00000035
-12(a7),a7
#0x1c,(a7)
;SRAM base address
;RAMBAR valid + mask bits
;allocate temporary space
;store D2/D3/D4 registers
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
SRAM Overview
Freescale Semiconductor, Inc...
;
;
;
;
;
;
;
;
stack
+0
+4
+8
+12
+16
+20
+24
loop:
arguments and locations
saved d2
saved d3
saved d4
returnPc
pointer to source operand
destinationOffset
bytesToMove
move.l
movec.l
RAMBASE+RAMVALID,a0
a0,rambar0
;define RAMBAR0 contents
;load it
move.l
16(a7),a0
;load argument defining *src
lea.l
add.l
RAMBASE,a1
20(a7),a1
;memory pointer to RAM base
;include destination nOffset
move.l
asr.l
24(a7),d4
#4,d4
;load byte count
;divide by 16 to convert to loop count
.align
movem.l
movem.l
lea.l
lea.l
subq.l
bne.b
4
(a0),#0xf
#0xf,(a1)
16(a0),a0
16(a1),a1
#1,d4
loop
;force loop on 0-mod-4 address
;read 16 bytes from source
;store into RAM destination
;increment source pointer
;increment destination pointer
;decrement loop counter
;if done, then exit, else continue
movem.l
lea.l
rts
(a7),#0x1c
12(a7),a7
;restore d2/d3/d4 registers
;deallocate temporary space
8.5.5 Programming RAMBARs for Power Management
Because processor memory references may be simultaneously sent to an SRAM module
and cache, power can be minimized by configuring RAMBAR address space masks as
precisely as possible. For example, if an SRAM is mapped to the internal instruction bus
and contains supervisor instruction data, setting the ASn mask bits associated with all
operand references and user-mode instruction fetches can decrease power dissipation.
Similarly, if the SRAM contains only supervisor data, setting the ASn bits associated with
instruction fetches and user-mode data accesses minimizes power.
Table 8-23 shows typical RAMBAR configurations.
.
Table 8-23. Examples of Typical RAMBAR Settings
RAMBAR[7–0] Data Contained in SRAM
0x21
Both code and data
0x2B
Code only
0x35
Data only
0x35
Supervisor and user data
0x37
Supervisor-only data
0x3C
User-only data
0xAB
Supervisor and user code
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-27
Freescale Semiconductor, Inc.
ROM Overview
Table 8-23. Examples of Typical RAMBAR Settings (Continued)
RAMBAR[7–0] Data Contained in SRAM
0xAF
Supervisor-only code
0xBB
User-only code
Freescale Semiconductor, Inc...
8.6 ROM Overview
On-chip ROM modules connect to the instruction and data buses, as shown in Figure 8-1
and Figure 8-2internal bus, and they provide pipelined, single-cycle access to devices
memory-mapped to them. Memory can be independently mapped to any properly aligned
location in the 4-Gbyte address space and configured to respond to either instruction or data
accesses.
Time-critical functions can be mapped into instruction memory. The system stack or other
heavily referenced data can be mapped into data memory.
The following summarizes features of the CF4e ROM implementation:
•
•
•
•
•
Single-cycle throughput. When the pipeline is full, one access can occur per clock
cycle.
Physical location on the processor’s high-speed local buses with a user-programmed
connection to the internal instruction or data bus
Memory location programmable on any aligned boundary; typically boundaries are
0-modulo-size aligned.
Byte, word, longword, and line-sized access capabilities
The ROM base address registers (ROMBAR0 and ROMBAR1) define the logical
base address, attributes, and access types for the two ROM modules.
8.6.1 ROM Operation
A ROM module provides a general-purpose block of read-only memory that the ColdFire
processor can access with single-cycle throughput.
The memory block’s location can be specified to any word-aligned address in the 4-Gbyte
address space by ROMBARn[BA], described in Section 8.5.2.1, “SRAM Base Address
Registers (RAMBAR0/RAMBAR1).” Such memory is ideal for storing critical code or data
structures or for use as the system stack. When mapped as an instruction memory, the ROM
can service instruction fetches generated by the processor core. When mapped as a data
memory, the ROM can service operand accesses from the processor and
memory-referencing debug module commands.
The Version 4 ColdFire processor core implements a Harvard memory architecture. Each
ROM module may be logically connected to either the processor’s internal instruction or
data bus. This logical connection is controlled by a configuration bit in the ROM base
8-28
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ROM Overview
address registers (ROMBAR0 and ROMBAR1).
Note also that on-chip DMAs cannot access ROMs. The on-chip system configuration
allows concurrent core and DMA execution, where the core can reference code or data from
the internal SRAMs or caches while performing a DMA transfer.
8.6.2 ROM Programming Model
The ROM programming model consists of ROMBAR0 and ROMBAR1.
8.6.2.1 ROM Base Address Registers (ROMBAR0/ROMBAR1)
Freescale Semiconductor, Inc...
The ROM modules are configured through the ROMBARs, shown in Figure 8-10.
•
•
•
•
•
Each ROMBAR holds the base address of the ROM.
The MOVEC instruction provides write-only access to this register from the
processor.
Each ROMBAR can be read or written from the debug module in a similar manner.
All undefined ROMBAR bits are reserved. These bits are ignored during writes to
the ROMBAR and return zeros when read from the debug module.
The initial state of the valid bit (V) is controlled by the value of a core input pin. If
the kromvldrst input is asserted at reset, the contents of the ROMBAR is forced to
0x0000_0121. This defines a valid ROM memory, based at address 0,
write-protected with the CPU space/interrupt acknowledge accesses masked. If
kromvldrst is negated, ROMBARn[V] is cleared by reset, disabling the ROM
module.
31
Field
Reset
R/W
Address
9
BA
8
6
WP D/I
—
—
00
5
4
3
2
1
0
C/I SC SD UC UD
—
—
—
—
—
V
x1
W for CPU; R/W for debug
CPU space + 0xC00 (ROMBAR0), 0xC01 (ROMBAR1
1 If kromvldrst is asserted at reset, the contents of the ROMBAR is forced to 0x0000_0121. This defines a valid ROM
memory, based at address 0, write-protected with the CPU space/interrupt acknowledge accesses masked. If
kromvldrst is negated, the valid bit is cleared by reset, disabling the ROM module.
Figure 8-10. ROM Base Address Registers (ROMBAR0/ROMBAR1)
ROMBARn fields are described in detail in Table 8-24.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-29
Freescale Semiconductor, Inc.
ROM Overview
Freescale Semiconductor, Inc...
Table 8-24. ROMBAR Field Descriptions
Bits
Name
Description
31–9
BA
Base address. Defines the ROM module base address. ROM alignment is implementation
specific. See Table 8-25.
8
WP
Write protect. Controls read/write properties of the ROM.
0 Allows read and write accesses to the SROM module
1 Allows only read accesses to the SROM module. Any attempted write reference generates an
access error exception to the ColdFire processor core.
7
D/I
Data/instruction bus. Indicates whether ROM is connected to the internal data or instruction bus.
0 Data bus
1 Instruction bus
6
—
Reserved, should be cleared.
5–1
C/I,
SC,
SD,
UC,
UD
0
V
Address space masks (ASn). Allows specific address spaces to be enabled or disabled, placing
internal modules in a specific address space. If an address space is disabled, an access to the
register in that address space becomes an external bus access, and the module resource is not
accessed. These bits are useful for power management as described in Section 8.6.4,
“Programming ROMBARs for Power Management.” In particular, C/I is typically set.
The address space mask bits are follows:
C/I = CPU space/interrupt acknowledge cycle mask.
Note: C/I must be set if BA = 0.
SC = Supervisor code address space mask
SD = Supervisor data address space mask
UC = User code address space mask
UD = User data address space mask
For each ASn bit:
0 An access to the ROM module can occur for this address space
1 Disable this address space from the ROM module. References to this address space cannot
access the ROM module and are processed like other non-ROM references.
Valid. Indicates whether ROMBAR contents are valid. The BA value is not used and the ROM
module is not accessible until V is set.
0 Contents of ROMBAR are not valid. The ROM module is disabled.
1 Contents of ROMBAR are valid. The ROM module is enabled.
If kromvldrst. is asserted at reset, the contents of the ROMBAR are forced to 0x0000_0121. This
defines a valid ROM memory, based at address 0, write-protected with the CPU space/interrupt
acknowledge accesses masked. If kromvldrst is negated, the valid bit is cleared by reset, disabling
the ROM module.
The KROM controller uses the base address bits it needs and ignores lower-order bits to
support the configured KROM size, as shown in Table 8-25.
Table 8-25. KROM Size Configuration
KROM Size Input Vector1
KROM Size
Active Base Address Bits
Ignored Base Address Bits
0000
0 Bytes
None
31:9
0001
512 Bytes
31:9
None
0010
1 Kbytes
31:10
9
0011
2 Kbytes
31:11
10:9
0100
4 Kbytes
31:12
11:9
0101
8 Kbytes
31:13
12:9
0110
16 Kbytes
31:14
13:9
8-30
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ROM Overview
Table 8-25. KROM Size Configuration (Continued)
KROM Size Input Vector1
KROM Size
Active Base Address Bits
Ignored Base Address Bits
0111
32 Kbytes
31:15
14:9
1000
64 Kbytes
31:16
15:9
1001–1111
1
Reserved
The KROM size input vector is determined by the core input signals kram0size[3:0] and kram1size[3:0].
Therefore, the KROMs sizes are independently selected.
The mapping of a given access into the ROM uses the following algorithm to determine if
the access hits in the memory:
Freescale Semiconductor, Inc...
if (ROMBAR[0] = 1)
if (((access = instructionFetch) & (ROMBAR[7] = 1)) |
((access = dataReference)
& (ROMBAR[7] = 0)))
if (requested address[31:n] = ROMBAR[31:n]
if (ASn of the requested type = 0)
Access is mapped to the ROM module
if (access = read)
Read the ROM and return the data
if (access = write)
if (ROMBAR[8] = 0)
Write the data into the ROM
else Signal a write-protect access error
ASn refers to the five address space mask bits: C/I, SC, SD, UC, and UD.
8.6.3 ROM Initialization
After a hardware reset, the contents of each ROM module are undefined. If the ROM
requires initialization with instructions or data, the following steps should be performed:
1. Load ROMBARn with bit 7 = 0, mapping the ROM module to the desired location.
Clearing ROMBARn[7] logically connects the ROM module to the processor’s data
bus.
2. Read the source data and write it to the ROM. Various instructions support this
function, including memory-to-memory move instructions and the move multiple
instruction (MOVEM). MOVEM is optimized to generate line-sized burst fetches on
line-aligned addresses, so it generally provides maximum performance.
3. After the data is loaded into the ROM, it may be appropriate to revise the ROMBAR
attribute bits, including the write-protect and address space mask fields. If the ROM
contains instructions, ROMBAR[D/I] must be set to logically connect the memory
to the processor’s internal instruction bus.
Remember that the ROM cannot be accessed by on-chip DMAs. The on-chip system
configuration allows concurrent core and DMA execution where the core can execute code
out of internal ROM or cache during DMA access.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-31
Freescale Semiconductor, Inc.
Cache Overview
The ColdFire processor or an external emulator using the debug module can perform these
initialization functions.
8.6.4 Programming ROMBARs for Power Management
Depending on the ROMBAR configuration, memory accesses can be sent to a ROM
module and cache simultaneously. If an access hits both, the ROM module sources read
data and the instruction cache access is discarded. Because the ROM contains only data,
setting ROMBAR[SC,UC] lowers power dissipation by disabling the ROM during
instruction fetches.
Table 8-26 shows typical ROMBAR settings:
Freescale Semiconductor, Inc...
.
Table 8-26. Examples of Typical ROMBAR Settings
ROMBAR[7–0]
Data Contained in ROM
0x21
Both code and data
0x2B
Code only
0x35
Data only
0x35
Supervisor and user data
0x37
Supervisor-only data
0x3C
User-only data
0xAB
Supervisor and user code
0xAF
Supervisor-only code
0xBB
User-only code
RAMBARs are configured similarly, as described in Section 8.5.5, “Programming
RAMBARs for Power Management.”
8.7 Cache Overview
This section describes the Harvard cache implementation, including organization,
configuration, and coherency. It describes cache operations and how the instruction and
data caches interact with other memory structures.
The CF4e implements a special branch instruction cache for accelerating branches, enabled
by a bit in the cache access control register (CACR[BEC]).
Caches improve system performance by providing single-cycle access to the instruction
and data pipelines. This decouples processor performance from system memory
performance, increasing bus availability for on-chip DMA or external devices. Figure 8-1
and Figure 8-2 show the integration of the instruction and data caches in the local memory
model.
8-32
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
Both the instruction and data cache implement line-fill buffers to optimize line-sized burst
accesses. The data cache supports operation of copyback, write-through, or cache-inhibited
modes. A four-entry, 32-bit buffer supports cache line-push operations, and can be
configured to defer write buffering in write-through or cache-inhibited modes. The cache
lock feature can be used to guarantee deterministic response for critical code or data areas.
Freescale Semiconductor, Inc...
A nonblocking cache services read hits or write hits from the processor while a fill (caused
by a cache allocation) is in progress.
The data and instruction caches are physically addressed. A cache hit occurs when an
address matches a cache entry. For a read, the cache supplies data to the processor. For a
write, which is permitted only to the data cache, the processor updates the cache. If an
access does not match a cache entry (misses the cache) or if a write access must be written
through to memory, the core system bus controller performs the necessary system bus
transactions.
Cache modules do not implement bus snooping; cache coherency with other possible bus
masters must be maintained in software.
8.7.1 Optimizing Cache Recommendation
The following is recommended for optimal cache performance.
•
•
Cache data and instruction space to improve system performance.
Do not cache the following:
— SIM space
— Memory-mapped I/O space
— DMA space
8.7.2 Cache Organization
A four-way set associative cache is organized as four ways (levels). Each line contains 16
bytes (4 longwords). Entire cache lines are loaded from memory by burst-mode accesses
that cache 4 longwords of data or instructions. All 4 longwords must be loaded for the cache
line to be valid.
Figure 8-11 shows the data cache organization, which differs from the instruction cache by
the inclusion of the modified bit, which indicates that the data cache block has been written
to without changing the corresponding location in system memory.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-33
Freescale Semiconductor, Inc.
Cache Overview
Way 0
Way 1
Way 2
Way 3
•
•
•
•
•
•
Line
•
•
•
•
•
•
Set 0
Set 1
Set n-1
Set n
Cache Line Format
Freescale Semiconductor, Inc...
TAG
V
M
Longword 0
Longword 1
Longword 2
Longword 3
Where:
TAG—21-bit address tag
V—Valid bit for line
M—Modified bit for line (data cache only)
Figure 8-11. Data Cache Organization and Line Format
A set is a group of four lines (one from each level, or way), corresponding to the same index
into the cache array.
8.7.2.1 Cache Line States: Invalid, Valid-Unmodified, and
Valid-Modified
As shown in Table 8-27, a data cache line can be invalid, valid-unmodified (often called
exclusive), or valid-modified. An instruction cache line can be valid or invalid.
Table 8-27. Valid and Modified Bit Settings
V
M
Description
0
x
Invalid. Invalid lines are ignored during lookups.
1
0
Valid, unmodified. Cache line has valid data that matches system memory.
1
1
Valid, modified. Cache line contains most recent data, data at system memory location is stale.
A valid line can be explicitly invalidated by executing a CPUSHL instruction.
8.7.2.2 Cache at Start-Up
As Figure 8-12 (A) shows, after power-up, cache contents are undefined; V and M may be
set on some lines even though the cache may not contain the appropriate data for start-up.
Because reset and power-up do not invalidate cache lines automatically, the cache must be
cleared explicitly by setting CACR[DCINVA,ICINVA] before the cache is enabled (B).
After the entire cache is automatically flushed, cacheable entries are loaded first in way 0.
If way 0 is occupied, the cacheable entry is loaded into the same set in way 1, as shown in
Figure 8-12 (D). This process is described in detail in Section 8.7.3, “Cache Operation.”
8-34
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
A:Cache population at
start-up
B:Cache after invalidation, C:Cache after loads in
before it is enabled
Way 0
D:First load in Way 1
Way 0 Way 1 Way 2 Way 3
Way 0 Way 1 Way 2 Way 3
Way 0 Way 1 Way 2 Way 3
Way 0 Way 1 Way 2 Way 3
At reset, cache contents
are indeterminate; V and
M may be set. The cache
should be cleared
explicitly by setting
CACR[DCINVA] before
the cache is enabled.
Setting CACR[DCINVA]
invalidates the entire
cache.
Initial cacheable
accesses to memory-fill
positions in way 0.
A line is loaded in
way 1 only if that set is
full in way 0.
Freescale Semiconductor, Inc...
Set 0
Set 127
Invalid (V = 0)
Valid, not modified (V = 1, M = 0)
Valid, modified (V = 1, M = 1)
Figure 8-12. Data Cache—A: at Reset, B: after Invalidation, C and D: Loading
Pattern
8.7.3 Cache Operation
Figure 8-13 shows the general flow of a caching operation using the 8-Kbyte data cache as
an example. The discussion in this chapter assumes a data cache. Instruction cache
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-35
Freescale Semiconductor, Inc.
Cache Overview
operations are similar except that there is no support for writing to the cache directly;
therefore such notions as modified cache lines and write allocation do not apply.
Address
31
11 10
Freescale Semiconductor, Inc...
Tag Data/Tag Reference
Set
Select
A[10:4]
4 3 0
Way 3
Way 2
Way 1
Way 0
Index
Set 0
TAG
Set 1
•
•
•
•
•
•
Set 127
TAG
STATUS LW0 LW1 LW2 LW3
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
STATUS LW0 LW1 LW2 LW3
Data
Address
A[31:11]
MUX
3
2
1
Comparator
0
Line Select
Hit 3
Hit 2
Hit 1
Hit 0
Logical OR
Hit
Figure 8-13. Data Caching Operation
The following steps determine if a data cache line is allocated for a given address:
1. The cache set index, A[10:4], selects one cache set.
2. A[31:11] and the cache set index are used as a tag reference or are used to update
the cache line tag field.
3. Note that A[31:11] can specify 221 possible addresses that can be mapped to one of
four ways.
4. The four tags from the selected cache set are compared with the tag reference. A
cache hit occurs if a tag matches the tag reference and the V bit is set, indicating that
the cache line contains valid data. If a cacheable write access hits in a valid cache
line, the write can occur to the cache line without having to load it from memory.
If the memory space is copyback, the updated cache line is marked modified
(M = 1), because the new data has made the data in memory out of date. If the
memory location is write-through, the write is passed on to system memory and the
M bit is never used. Note that the tag does not have TT or TM bits.
To allocate a cache entry, the cache set index selects one of the cache’s 128 sets. The cache
control logic looks for an invalid cache line to use for the new entry. If none is available,
8-36
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
the cache controller uses a pseudo-round-robin replacement algorithm to choose the line to
be deallocated and replaced. First the cache controller looks for an invalid line, with way 0
the highest priority. If all lines have valid data, a 2-bit replacement counter is used to choose
the way. After a line is allocated, the pointer increments to point to the next way.
Cache lines from ways 0 and 1 can be protected from deallocation by enabling half-cache
locking. If CACR[DHLCK,IHLCK] = 1, the replacement pointer is restricted to way 2 or 3.
Freescale Semiconductor, Inc...
As part of deallocation, a valid, unmodified cache line is invalidated. It is consistent with
system memory, so memory does not need to be updated. To deallocate a modified cache
line, data is placed in a push buffer (for an external cache line push) before being
invalidated. After invalidation, the new entry can replace it. The old cache line may be
written after the new line is read.
When a cache line is selected to host a new cache entry, the following three things happen:
1. The new address tag bits A[31:11] are written to the tag.
2. The cache line is updated with the new memory data.
3. The cache line status changes to a valid state (V = 1).
Read cycles that miss in the cache allocate normally as previously described.
Write cycles that miss in the cache do not allocate on a cacheable write-through region, but
do allocate for addresses in a cacheable copyback region.
A copyback byte, word, longword, or line write miss causes the following:
1.
2.
3.
4.
The cache initiates a line fill or flush.
Space is allocated for a new line.
V and M are both set to indicate valid and modified.
Data is written in the allocated space. No write to memory occurs.
Note the following:
•
•
•
•
Read hits cannot change the status bits and no deallocation or replacement occurs;
the data or instructions are read from the cache.
If the cache hits on a write access, data is written to the appropriate portion of the
accessed cache line. Write hits in cacheable, write-through regions generate an
external write cycle and the cache line is marked valid, but is never marked modified.
Write hits in cacheable copyback regions do not perform an external write cycle; the
cache line is marked valid and modified (V = 1 and M = 1).
Misaligned accesses are broken into at least two cache accesses.
Validity is provided only on a line basis. Unless a whole line is loaded on a cache
miss, the cache controller does not validate data in the cache line.
Write accesses designated as cache-inhibited by the CACR or ACR bypass the cache and
perform a corresponding external write.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-37
Freescale Semiconductor, Inc.
Cache Overview
Normally, cache-inhibited reads bypass the cache and are performed on the external bus.
The exception to this normal operation occurs when all of the following conditions are true
during a cache-inhibited read:
•
•
•
The cache-inhibited fill buffer bit, CACR[DNFB], is set.
The access is an instruction read.
The access is normal (that is, transfer type (TT) equals 0).
Freescale Semiconductor, Inc...
In this case, an entire line is fetched and stored in the fill buffer, where it remains valid. The
cache can service additional read accesses from this buffer until either another fill or a
cache-invalidate-all operation occurs.
Valid cache entries that match during cache-inhibited address accesses are neither pushed
nor invalidated. Such a scenario suggests that the associated cache mode for this address
space was changed. To avoid this, it is generally recommended to use the CPUSHL
instruction to push or invalidate the cache entry or set CACR[DCINVA] to invalidate the
data cache before switching cache modes.
8.7.4 Caching Modes
For every memory reference generated by the processor or debug module, a set of effective
attributes is determined based on the address and the ACRs. Caching modes determine how
the cache handles an access. A data access can be cacheable in either write-through or
copyback mode; it can be cache-inhibited in precise or imprecise modes. For normal
accesses, the ACRn[CM] bit corresponding to the address of the access specifies the
caching modes. If an address does not match an ACR, the default caching mode is defined
by CACR[DDCM,IDCM]. The specific algorithm for the data cache prioritization (which
uses ACR0 and ACR1) is as follows:
if (address == ACR0-address including mask)
effective attributes = ACR0 attributes
else if (address == ACR1-address including mask)
effective attributes = ACR1 attributes
else effective attributes = CACR default attributes
Addresses matching an ACR can also be write-protected using ACR[W]. Addresses that do
not match either ACR can be write-protected using CACR[DW].
Reset disables the cache and clears all CACR bits. As shown in Figure 8-12, reset does not
automatically invalidate cache entries; they must be invalidated through software.
The ACRs allow the defaults selected in the CACR to be overridden. In addition, some
instructions (for example, CPUSHL) and processor core operations perform accesses that
have an implicit caching mode associated with them. The following sections discuss the
different caching accesses and their associated cache modes.
8-38
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
8.7.4.1 Cacheable Accesses
If ACRn[CM] or the default field of the CACR indicates write-through or copyback, the
access is cacheable. A read access to a write-through or copyback region is read from the
cache if matching data is found. Otherwise, the data is read from memory and the cache is
updated. When a line is being read from memory for either a write-through or copyback
read miss, the longword within the line that contains the core-requested data is loaded first
and the requested data is given immediately to the processor, without waiting for the three
remaining longwords to reach the cache.
Freescale Semiconductor, Inc...
The following sections describe write-through and copyback modes in detail. Note that
some of this information applies to data caches only.
8.7.4.2 Write-Through Mode (Data Cache Only)
Write accesses to regions specified as write-through are always passed on to the external
bus, although the cycle can be buffered, depending on the state of CACR[DESB]. Writes in
write-through mode are handled with a no-write-allocate policy—that is, writes that miss
in the cache are written to the external bus but do not cause the corresponding line in
memory to be loaded into the cache. Write accesses that hit always write through to
memory and update matching cache lines. The cache supplies data to data-read accesses
that hit in the cache; read misses cause a new cache line to be loaded into the cache.
8.7.4.3 Copyback Mode (Data Cache Only)
Copyback regions are typically used for local data structures or stacks to minimize external
bus use and reduce write-access latency. Write accesses to regions specified as copyback
that hit in the cache update the cache line and set the corresponding M bit without an
external bus access.
Be sure to flush the cache using the CPUSHL instruction before invalidating the cache in
copyback mode. Modified cache data is written to memory only if the line is replaced
because of a miss or a CPUSHL instruction pushes the line. If a byte, word, longword, or
line write access misses in the cache, the required cache line is read from memory, thereby
updating the cache. When a miss selects a modified cache line for replacement, the
modified cache data moves to the push buffer. The replacement line is read into the cache
and the push buffer contents are then written to memory.
8.7.5 Cache-Inhibited Accesses
Memory regions can be designated as cache-inhibited, which is useful for memory
containing targets such as I/O devices and shared data structures in multiprocessing
systems. It is also important to not cache memory-mapped registers. If the corresponding
ACRn[CM] or CACR[DDCM] indicates cache-inhibited, precise or imprecise, the access
is cache-inhibited. The caching operation is identical for both cache-inhibited modes,
which differ only regarding recovery from an external bus error.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-39
Freescale Semiconductor, Inc.
Cache Overview
In determining whether a memory location is cacheable or cache-inhibited, the CPU checks
memory-control registers in the following order:
Freescale Semiconductor, Inc...
1.
2.
3.
4.
5.
RAMBARs
ROMBARs
ACR0 and ACR2
ACR1 and ACR3
If an access does not hit in the RAMBARs, ROMBARs, or the ACRs, the default is
provided for all accesses in CACR.
Cache-inhibited write accesses bypass the cache and a corresponding external write is
performed. Cache-inhibited reads bypass the cache and are performed on the external bus,
except when all of the following conditions are true:
•
•
•
The cache-inhibited fill-buffer bit, CACR[DNFB], is set.
The access is an instruction read.
The access is normal (that is, TT = 0).
In this case, a fetched line is stored in the fill buffer and remains valid there; the cache can
service additional read accesses from this buffer until another fill occurs or a
cache-invalidate-all operation occurs.
If ACRn[CM] indicates cache-inhibited mode, precise or imprecise, the controller bypasses
the cache and performs an external transfer. If a line in the cache matches the address and
the mode is cache-inhibited, the cache does not automatically push the line if it is modified,
nor does it invalidate the line if it is valid. Before switching cache mode, execute a
CPUSHL instruction or set CACR[DCINVA,ICINVA] to invalidate the entire cache.
If ACRn[CM] indicates precise mode, the sequence of read and write accesses to the region
is guaranteed to match the instruction sequence. In imprecise mode, the processor core
allows read accesses that hit in the cache to occur before completion of a pending write
from a previous instruction. Writes are not deferred past data-read accesses that miss the
cache (that is, that must be read from the bus).
Precise operation forces data-read accesses for an instruction to occur only once by
preventing the instruction from being interrupted after data is fetched. Otherwise, if the
processor is not in precise mode, an exception aborts the instruction and the data may be
accessed again when the instruction is restarted. These guarantees apply only when
ACRn[CM] indicates precise mode and aligned accesses.
CPU space-register accesses, such as MOVEC, are treated as cache-inhibited and precise.
8.7.6 Cache Protocol
The following sections describe the cache protocol for processor accesses and assumes that
the data is cacheable (that is, write-through or copyback). Note that the discussion of write
operations applies to the data cache only.
8-40
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
8.7.6.1 Read Miss
A processor read that misses in the cache requests the cache controller to generate a bus
transaction. This bus transaction reads the needed line from memory and supplies the
required data to the processor core. The line is placed in the cache in the valid state.
8.7.6.2 Write Miss (Data Cache Only)
The cache controller handles processor writes that miss in the data cache differently for
write-through and copyback regions. Write misses to copyback regions cause the cache line
to be read from system memory, as shown in Figure 8-14.
Freescale Semiconductor, Inc...
1. Writing character X to 0x0A generates a write miss. Data cannot be written to an invalid line.
Cache Line
0x0C 0x08
Core
0x04
0x00
V=0
M=0
X
2. The cache line (characters A–P) is updated from system memory, and the line is marked valid.
0x0C 0x08 0x04 0x00
V=1
ABCD EFGH IJKL MNOP M = 0
System
Memory
3. After the cache line is filled, the write that initiated the write miss (the character X) completes.
Core
0x0C 0x08 0x04 0x00
V=1
ABCD EXGH IJKL MNOP M = 1
Figure 8-14. Write-Miss in Copyback Mode
The new cache line is then updated with write data and the M bit is set for the line, leaving
it in modified state. Write misses to write-through regions write directly to memory without
loading the corresponding cache line into the cache.
8.7.6.3 Read Hit
On a read hit, the cache provides the data to the processor core and the cache line state
remains unchanged. If the cache mode changes for a specific region of address space, lines
in the cache corresponding to that region that contain modified data are not pushed out to
memory when a read hit occurs within that line. First execute a CPUSHL instruction or set
CACR[DCINVA,ICINVA] before switching the cache mode.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-41
Freescale Semiconductor, Inc.
Cache Overview
8.7.6.4 Write Hit (Data Cache Only)
The cache controller handles processor writes that hit in the data cache differently for
write-through and copyback regions. For write hits to a write-through region, portions of
cache lines corresponding to the size of the access are updated with the data. The data is
also written to external memory. The cache line state is unchanged. For copyback accesses,
the cache controller updates the cache line and sets the M bit for the line. An external write
is not performed and the cache line state changes to (or remains in) the modified state.
Freescale Semiconductor, Inc...
8.7.7 Cache Coherency (Data Cache Only)
The CF4e provides limited cache coherency support in multiple-master environments. Both
write-through and copyback memory update techniques are supported to maintain
coherency between the cache and memory.
The cache does not support snooping (that is, cache coherency is not supported while
external or DMA masters are using the bus). Therefore, any on-chip DMAs access local
memory and do not maintain coherency with the data cache. Therefore, cache coherency is
left to the user.
8.7.8 Memory Accesses for Cache Maintenance
The cache controller performs all maintenance activities that supply data from the cache to
the core, including requests to the SIM for reading new cache lines and writing modified
lines to memory. The following sections describe memory accesses resulting from cache
fill and push operations.
8.7.8.1 Cache Filling
When a new cache line is required, the core system bus controller performs a burst-read
transfer on the system bus.
SIM line accesses implicitly request burst-mode operations from memory.
The first cycle of a cache-line read loads the longword entry corresponding to the requested
address. Subsequent transfers load the remaining longword entries.
A burst operation is aborted by a write-protection fault, which is the only possible access
error. Exception processing proceeds immediately. Note that unlike Version 2 and Version 3
access errors, in this version, the program counter stored on the exception stack frame
points to the faulting instruction. See Chapter 7, “Exception Processing.”
8.7.8.2 Cache Pushes
Cache pushes occur for line replacement and as required for the execution of the CPUSHL
instruction. To reduce the requested data’s latency in the new line, the modified line being
replaced is temporarily placed in the push buffer while the new line is fetched from
8-42
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
memory. After the bus transfer for the new line completes, the modified cache line is written
back to memory and the push buffer is invalidated.
8.7.8.2.1 Push and Store Buffers
The 16-byte push buffer reduces latency for requested new data on a cache miss by holding
a displaced modified data cache line while the new data is read from memory.
Freescale Semiconductor, Inc...
If a cache miss displaces a modified line, a miss read reference is immediately generated.
While waiting for the response, the current contents of the cache location load into the push
buffer. When the burst-read bus transaction completes, the cache controller can generate the
appropriate line-write bus transaction to write the push buffer contents into memory. Note
that this is not programmable and is always on if the cache is used.
In imprecise mode, the FIFO store buffer can defer pending writes to maximize
performance. The store buffer can support as many as four entries (16 bytes maximum) for
this purpose.
Data writes destined for the store buffer cannot stall the core. The store buffer effectively
provides a measure of decoupling between the pipeline’s ability to generate writes (one per
cycle maximum) and the external bus’s ability to retire those writes. In imprecise mode,
writes stall the core only if the store buffer is full and a write operation is on the internal
bus. The internal write cycle is held, stalling the data execution pipeline.
If the store buffer is not used (that is, store buffer disabled or cache-inhibited precise mode),
external bus cycles are generated directly for each pipeline write operation. The instruction
is held in the pipeline until external bus transfer termination is received.
The data store buffer enable bit, CACR[DESB], controls the enabling of the data store
buffer. This bit can be set and cleared by the MOVEC instruction. DESB is zero at reset and
all writes are performed in order (precise mode). ACRn[CM] or CACR[DDCM] generates
the mode used when DESB is set. Cacheable write-through and cache-inhibited imprecise
modes use the store buffer.
The store buffer can queue data as much as 4 bytes wide per entry. Each entry matches the
corresponding bus cycle it generates; therefore, a misaligned longword write to a
write-through region creates two entries if the address is to an odd-word boundary. It
creates three entries if it is to an odd-byte boundary—one per bus cycle.
8.7.8.2.2 Push and Store Buffer Bus Operation
As soon as the push or store buffer has valid data, the internal bus controller uses the next
available external bus cycle to generate the appropriate write cycles. In the event that
another cache fill is required (for example, cache miss to process) during the continued
instruction execution by the processor pipeline, the pipeline stalls until the push and store
buffers are empty, then generates the required external bus transaction.
Exception processing and the cpushl, intouch, halt, move-to-SR, movec, nop, rte, stop,
wdebug instructions synchronize the processor core and guarantee the push and store
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-43
Freescale Semiconductor, Inc.
Cache Overview
buffers are empty before proceeding. Note that the NOP instruction should be used only to
synchronize the pipeline. The preferred no-operation function is the TPF instruction.
8.7.9 Cache Locking
Freescale Semiconductor, Inc...
Ways 0 and 1 of the data cache can be locked by setting CACR[DHLCK]; likewise, ways
0 and 1 of the instruction cache can be locked by setting CACR[IHLCK]. If a cache is
locked, cache lines in ways 0 and 1 are not subject to being deallocated by normal cache
operations.
As Figure 8-15 (B and C) shows, the algorithm for updating the cache and for identifying
cache lines to be deallocated is otherwise unchanged. If ways 2 and 3 are entirely invalid,
cacheable accesses are first allocated in way 2. Way 3 is not used until the location in way 2
is occupied.
Ways 0 and 1 are still updated on write hits (D in Figure 8-15) and may be pushed or cleared
explicitly by using specific cache push/invalidate instructions. However, new cache lines
cannot be allocated in ways 0 and 1.
Figure 8-15 also shows the benefits of using the INTOUCH instruction to systematically
fill ways 0 and 1.
8-44
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
A:Ways 0 and 1 are filled.
Ways 2 and 3 are
invalid.
B:CACR[DHLCK] is set,
locking ways 0 and 1.
C:When a set in Way 2 is
D:Write hits to ways 0
occupied, the set in Way 3 and 1 update cache
is used for a cacheable
lines.
access.
Way 0 Way 1 Way 2 Way 3
Way 0 Way 1 Way 2 Way 3
Way 0 Way 1 Way 2 Way 3
Way 0 Way 1 Way 2 Way 3
After CACR[DHLCK] is
set, subsequent cache
accesses go to ways 2
and 3.
While the cache is
locked and after a
position in Way 2 is full,
the set in Way 3 is
updated.
While the cache is
locked, ways 0 and 1 can
be updated by write hits.
In this example, memory
is configured as
copyback, so updated
cache lines are marked
modified.
Freescale Semiconductor, Inc...
et 0
et 127
After reset, the cache is
invalidated, ways 0 and 1
are then written with data
that should not be
deallocated. Ways 0 and 1
can be filled
systematically by using
the INTOUCH instruction.
Invalid (V = 0)
Valid, not modified (V = 1, M = 0)
Valid, modified (V = 1, M = 1)
Figure 8-15. Data Cache Locking
8.7.10 Cache Registers
This section describes the CF4e cache registers.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-45
Freescale Semiconductor, Inc.
Cache Overview
8.7.10.1 Cache Control Register (CACR)
The CACR in Figure 8-16 contains bits for configuring the cache. It can be written by the
MOVEC register instruction and can be read or written from the debug facility. A hardware
reset clears CACR, which disables the cache; however, reset does not affect the tags, state
information, or data in the cache.
31
Field DEC
30
29
28
27
DW DESB DDPI DHLCK
26
25
DDCM
Reset
23
22
DCINVA DDSP
20
—
19
18
17
BEC BCINVA
16
—
All zeros
R/W
Freescale Semiconductor, Inc...
24
Write (R/W by debug module)
15
Field IEC
14
—
Reset
13
12
11
10
DNFB IDPI IHLCK IDCM
9
—
8
7
ICINVA IDSP
6
5
4
—
EUSP
DF
3
3
0
—
All zeros
R/W
Write (R/W by debug module)
Rc
0x002
Figure 8-16. Cache Control Register (CACR)
Table 8-28 describes CACR fields. Note that some implementations may include fields not
defined here; consult the part-specific documentation.
Table 8-28. CACR Field Descriptions
Bits
Name
31
DEC
Enable data cache.
0 Cache disabled. The data cache is not operational, but data and tags are preserved.
1 Cache enabled.
30
DW
Data default write-protect. For normal operations that do not hit in the RAMBARs or ACRs, this
field defines write-protection. See Section 8.7.4, “Caching Modes.”
0 Not write protected.
1 Write protected. Write operations cause an access error exception.
29
DESB
Enable data store buffer. Affects the precision of transfers.
0 Imprecise-mode, write-through or cache-inhibited writes bypass the store buffer and generate
bus cycles directly. Section 8.7.8.2.1, “Push and Store Buffers,” describes the associated
performance penalty.
1 The four-entry FIFO store buffer is enabled; to maximize performance, this buffer defers
pending imprecise-mode, write-through or cache-inhibited writes.
Precise-mode, cache-inhibited accesses always bypass the store buffer. Precise and imprecise
modes are described in Section 8.7.5, “Cache-Inhibited Accesses.”
28
DDPI
Disable CPUSHL invalidation.
0 Normal operation. A CPUSHL instruction causes the selected line to be pushed if modified,
then invalidated.
1 No clear operation. A CPUSHL instruction causes the selected line to be pushed if modified,
then left valid.
8-46
Description
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
Freescale Semiconductor, Inc...
Table 8-28. CACR Field Descriptions (Continued)
Bits
Name
Description
27
DHLCK
Half-data cache lock mode
0 Normal operation. The cache allocates the lowest invalid way. If all ways are valid, the cache
allocates the way pointed at by the counter and then increments this counter.
1 Half-cache operation. The cache allocates to the lower invalid way of levels 2 and 3; if both are
valid, the cache allocates to Way 2 if the high-order bit of the round-robin counter is zero;
otherwise, it allocates Way 3 and increments the round-robin counter. This locks the contents
of ways 0 and 1. Ways 0 and 1 are still updated on write hits and may be pushed or cleared by
specific cache push/invalidate instructions.
26–25
DDCM
Default data cache mode. For normal operations that do not hit in the RAMBARs, ROMBARs, or
ACRs, this field defines the effective cache mode.
00 Cacheable write-through imprecise
01 Cacheable copyback
10 Cache-inhibited precise
11 Cache-inhibited imprecise
Precise and imprecise accesses are described in Section 8.7.5, “Cache-Inhibited Accesses.”
24
DCINVA
Data cache invalidate all. Writing a 1 to this bit initiates entire cache invalidation. Once
invalidation is complete, this bit automatically returns to 0; it is not necessary to clear it explicitly.
Note the caches are not cleared on power-up or normal reset, as shown in Figure 8-12.
0 No invalidation is performed.
1 Initiate invalidation of the entire data cache. The cache controller sequentially clears V and M
bits in all sets. Subsequent data accesses stall until the invalidation is finished, at which point,
this bit is automatically cleared. In copyback mode, the cache should be flushed using a
CPUSHL instruction before setting this bit.
23
DDSP
Data default supervisor-protect. For normal operations that do not hit in the RAMBAR, ROMBAR,
or ACRs, this field defines supervisor-protection
0 Not supervisor protected
1 Supervisor protected. User operations cause a fault
22–20
—
19
BEC
18
BCINVA
17–16
—
15
IEC
14
—
13
DNFB
Reserved, should be cleared.
Enable branch cache.
0 Branch cache disabled. This may be useful if code is unlikely to be reused.
1 Branch cache enabled.
Branch cache invalidate. Invalidation occurs when this bit is written as a 1. Note that branch
caches are not cleared on power-up or normal reset.
0 No invalidation is performed.
1 Initiate an invalidation of the entire branch cache.
Reserved, should be cleared.
Enable instruction cache
0 Instruction cache disabled. All instructions and tags in the cache are preserved.
1 Instruction cache enabled.
Reserved, should be cleared.
Default cache-inhibited fill buffer
0 Fill buffer does not store cache-inhibited instruction accesses (16 or 32 bits).
1 Fill buffer can store cache-inhibited accesses. The buffer is used only for normal (TT = 0)
instruction reads of a cache-inhibited region. Instructions are loaded into the buffer by a burst
access (line fill). They stay in the buffer until they are displaced; subsequent accesses may not
appear on the external bus.
Setting DNFB can cause a coherency problem for self-modifying code. If a cache-inhibited
access uses the buffer while DNFB = 1, instructions remain valid in the buffer until a
cache-invalidate-all instruction, another cache-inhibited burst, or a miss that initiates a fill. A write
to the line in the fill goes to the external bus without updating or invalidating the buffer.
Subsequent reads of that written data are serviced by the fill buffer and receive stale information.
Note: Motorola discourages the use of self-modifying code.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-47
Freescale Semiconductor, Inc.
Cache Overview
Freescale Semiconductor, Inc...
Table 8-28. CACR Field Descriptions (Continued)
Bits
Name
Description
12
IDPI
11
IHLCK
Instruction cache half-lock.
0 Normal operation. The cache allocates to the lowest invalid way; if all ways are valid, the cache
allocates to the way pointed at by the round-robin counter and then increments this counter.
1 Half cache operation. The cache allocates to the lowest invalid way of ways 2 and 3; if both of
these ways are valid, the cache allocates to way 2 if the high-order bit of the round-robin
counter is zero; otherwise, it allocates way 3 and then increments the round-robin counter. This
locks the contents of ways 0 and 1. Ways 0 and 1 are still updated on write hits and may be
pushed or cleared by specific cache push/invalidate instructions.
10
IDCM
Instruction default cache mode. For normal operations that do not hit in the RAMBARs or ACRs,
this field defines the effective cache mode.
0 Cacheable
1 Cache-inhibited
9
—
8
ICINVA
7
IDSP
6
—
5
EUSP
4
DF
Disable FPU. Determines whether the FPU is enabled. See Section 4.1, “FPU Overview.”
0 FPU enabled.
1 FPU disabled
3–0
—
Reserved, should be cleared.
Instruction CPUSHL invalidate disable.
0 Normal operation. A CPUSHL instruction causes the selected line to be invalidated.
1 No clear operation. A CPUSHL instruction causes the selected line to be left valid.
Reserved, should be cleared.
Instruction cache invalidate. Invalidation occurs when this bit is written as a 1. Note the caches
are not cleared on power-up or normal reset.
0 No invalidation is performed.
1 Initiate invalidation of instruction cache. The cache controller sequentially clears all V bits.
Subsequent local memory bus accesses stall until invalidation completes, at which point
ICINVA is cleared automatically without software intervention. For copyback mode, use
CPUSHL before setting ICINVA.
Default instruction supervisor protection bit. For normal operations that do not hit in the
RAMBAR, ROMBAR, or ACRs, this field defines supervisor-protection.
0 Not supervisor protected
1 Supervisor protected. User operations cause a fault
Reserved, should be cleared.
Enable USP. Enables the use of the user stack pointer.
0 USP disabled. Core uses a single stack pointer.
1 USP enabled. Core uses separate supervisor and user stack pointers.
8.7.10.2 Access Control Registers (ACR0–ACR3)
The ACRs, Figure 8-17, assign control attributes, such as cache mode and write protection,
to specified memory regions. ACR0 and ACR1 control data attributes; ACR2 and ACR3
control instruction attributes. Registers are accessed with the MOVEC instruction with the
Rc encodings in Figure 8-17.
For overlapping data regions, ACR0 takes priority; ACR2 takes priority for overlapping
instruction regions. Data transfers to and from these registers are longword transfers.
NOTE:
The SIM MBAR region should be mapped as cache-inhibited
through an ACR or the CACR.
8-48
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
31
Field
24 23
Address Base
Reset
16 15 14 13 12 11
Address Mask
Uninitialized
R/W
E
S
—
10
9
AMM
0
7
—
6
5
4
3
CM SP
2
1
W1
0
—
Uninitialized
Write (R/W by debug module)
Rc
ACR0: 0x004; ACR1: 0x005; ACR2: 0x006; ACR3: 0x007
1 Reserved
in ACR2 and ACR3.
Figure 8-17. Access Control Register Format (ACRn)
Table 8-29 describes ACRn fields.
Freescale Semiconductor, Inc...
I
Table 8-29. ACRn Field Descriptions
Bits
Name
Description
31–24
Address
base
Address base. Compared with address bits A[31:24]. Eligible addresses that match are
assigned the access control attributes of this register.
23–16
Address
mask
Address mask. Setting a mask bit causes the corresponding address base bit to be ignored.
The low-order mask bits can be set to define contiguous regions larger than 16 Mbytes. The
mask can define multiple noncontiguous regions of memory.
15
E
Enable. Enables or disables the other ACRn bits.
0 Access control attributes disabled
1 Access control attributes enabled
14–13
S
Supervisor mode. Specifies whether only user or supervisor accesses are allowed in this
address range or if the type of access is a don’t care.
00 Match addresses only in user mode
01 Match addresses only in supervisor mode
1x Execute cache matching on all accesses
12–11
—
Reserved, should be cleared.
10
AMM
9–7
—
Reserved; should be cleared.
6–5
CM
Cache mode. Selects the cache mode and access precision. Precise and imprecise modes are
described in Section 8.7.5, “Cache-Inhibited Accesses.”
00 Cacheable, write-through
01 Cacheable, copyback
10 Cache-inhibited, precise
11 Cache-inhibited, imprecise
4
—
Reserved, should be cleared.
3
SP
Supervisor protect.
0 Indicates supervisor and user mode access allowed, reset value is 0
1 Indicates only supervisor access is allowed to this address space and attempted user mode
accesses generate an access error exception
Address mask mode.
0 The ACR hit function allows control of a 16 Mbytes or greater memory region.
1 The upper 8 bits of the address and ACR are compared without a mask function. Address bits
[23:20] of the address and ACR are compared using ACR[19:16] as a mask, allowing control
of a 1–16 Mbyte memory region.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-49
Freescale Semiconductor, Inc.
Cache Overview
Table 8-29. ACRn Field Descriptions (Continued)
Bits
Name
Description
2
W
ACR0/ACR1 only. Write protect. Selects the write privilege of the memory region. ACR2[W] and
ACR3[W] are reserved.
0 Read and write accesses permitted
1 Write accesses not permitted
1–0
—
Reserved, should be cleared.
Freescale Semiconductor, Inc...
8.7.11 Cache Management
The cache can be enabled and configured by using a MOVEC instruction to access CACR.
A hardware reset clears CACR, disabling the cache and removing all configuration
information; however, reset does not affect the tags, state information, or data in the cache.
Set CACR[DCINVA,ICINVA] to invalidate the caches before enabling them.
The privileged CPUSHL instruction supports cache management by selectively pushing
and invalidating cache lines. The address register used with CPUSHL directly addresses the
cache’s directory array. The CPUSHL instruction flushes a cache line.
The value of CACR[DDPI,IDPI] determines whether CPUSHL invalidates a cache line
after it is pushed. To push the entire cache, implement a software loop to index through all
sets and through each of the four ways in each set. The state of CACR[DEC,IEC] does not
affect the operation of CPUSHL or CACR[DCINVA,ICINVA]. Disabling a cache by
setting CACR[IEC] or CACR[DEC] makes the cache nonoperational without affecting
tags, state information, or contents.
The contents of An used with CPUSHL specify cache row and line indexes. Figure 8-18
shows the An format for the data cache.
31
13
12
0
4
3
Set Index
0
Way Index
Figure 8-18. An Format (Data Cache)
Figure 8-19 shows the An format for the instruction cache.
31
13
12
0
4
Set Index
3
0
Way Index
Figure 8-19. An Format (Instruction Cache)
The following code example flushes the entire data cache:
_cache_disable:
nop
move.w
jsr
clr.l
movec
8-50
#0x2700,SR
_cache_flush
d0
d0,ACR0
;mask off IRQs
;flush the cache completely
;ACR0 off
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
Freescale Semiconductor, Inc...
movec
move.l
movec
rts
d0,ACR1
#0x01000000,d0
d0,CACR
;ACR1 off
;Invalidate and disable cache
_cache_flush:
nop
moveq.l
moveq.l
move.l
#0,d0
#0,d1
d0,a0
;synchronize—flush store buffer
;initialize way counter
;initialize set counter
;initialize cpushl pointer
setloop:
cpushl
add.l
addq.l
cmpi.l
bne
dc,(a0)
#0x0010,a0
#1,d1
#128,d1
setloop
;push cache line a0
;increment set index by 1
;increment set counter
;are sets for this way done?
#0,d1
#1,d0
d0,a0
#4,d0
setloop
;set counter to zero again
;increment to next way
;set = 0, way = d0
;flushed all the ways?
moveq.l
addq.l
move.l
cmpi.l
bne
rts
The following CACR loads assume the instruction cache has been invalidated, the default
instruction cache mode is cacheable, and the default data cache mode is copyback.
dataCacheLoadAndLock:
move.l
movec
#0xA3080800,d0
d0,cacr
;enable and invalidate data cache ...
;... in the CACR
The following code preloads half of an 8-Kbyte data cache. It assumes a contiguous block
of data is to be mapped into the cache, starting at a 0-modulo-8K address.
move.l
lea
dataCacheLoop:
tst.b
lea
subq.l
bne.b
#256,d0
data_,a0
;256 16-byte lines in 4K space
;load pointer defining data area
(a0)
16(a0),a0
#1,d0
dataCacheLoop
;touch location + load into data cache
;increment address to next line
;decrement loop counter
;if done, then exit, else continue
;A 4-Kbyte region was loaded into levels 0 and 1 of the 8-Kbyte cache. Lock it!
move.l
movec
rts
#0xAA088000,d0
d0,cacr
align
16
;set the data cache lock bit ...
;... in the CACR
The following CACR loads assume the data cache has been invalidated, the default
instruction cache mode is cacheable, and the default operand cache mode is copyback.
Note that this function must be mapped into a cache inhibited or SRAM space or these text
lines will be prefetched into the instruction cache, which may displace some of the
16-Kbyte space being explicitly fetched.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-51
Freescale Semiconductor, Inc.
Cache Overview
instructionCacheLoadAndLock:
move.l
movec
#0xa2088100,d0
d0,cacr
;enable and invalidate the instruction
;cache in the CACR
The following code segments preload half of a 16-Kbyte instruction cache. It assumes a
contiguous block of data is to be mapped, starting at a 0-modulo-8K address.
move.l #512,d0
lea
code_,a0
instCacheLoop:
;
intouch (a0)
;512 16-byte lines in 8K space
;load pointer defining code area
;touch location + load into instruction cache
Freescale Semiconductor, Inc...
;Note in the assembler we use, there is no INTOUCH opcode. The following
;is used to produce the required binary representation
cpushl
#nc,(a0)
;touch location + load into
;instruction cache
lea
16(a0),a0
;increment address to next line
subq.l #1,d0
;decrement loop counter
bne.b
instCacheLoop
;if done, then exit, else continue
;A 8K region was loaded into levels 0 and 1 of the 16-Kbyte instruction cache.
;lock it!
move.l
movec
rts
#0xa2088800,d0
d0,cacr
;set the instruction cache lock bit
;in the CACR
8.7.12 Cache Operation Summary
This section gives operational details for the cache and presents instruction and data
cache-line state diagrams.
8.7.13 Instruction Cache State Transitions
Because the instruction cache does not support writes, it supports fewer operations than the
data cache. As Figure 8-20 shows, an instruction cache line can be in one of two states, valid
or invalid. Modified state is not supported. Transitions are labeled with a capital letter
indicating the previous state and with a number indicating the specific case listed in
Table 8-30. These numbers correspond to the equivalent operations on data caches,
described in Section 8.7.13.1, “Data Cache State Transitions.”
II5—ICINVA
II6—CPUSHL & IDPI
II7—CPUSHL & IDPI
IV1—CPU read miss
IV2—CPU read hit
IV7—CPUSHL & IDPI
II1—CPU read miss
Invalid
V=0
Valid
V=1
IV5—ICINVA
IV6—CPUSHL & IDPI
Figure 8-20. Instruction Cache Line State Diagram
8-52
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
Table 8-30 describes the instruction cache state transitions shown in Figure 8-20.
Table 8-30. Instruction Cache Line State Transitions
Previous State
Access
Invalid (V = 0)
Read miss II1
Freescale Semiconductor, Inc...
Read hit
Valid (V = 1)
Read line from memory and update cache; IV1 Read new line from memory and update cache;
supply data to processor;
supply data to processor; stay in valid state.
go to valid state.
II2
Not possible
IV2 Supply data to processor;
stay in valid state.
Write miss II3
Not possible
IV3 Not possible
Write hit
II4
Not possible
IV4 Not possible
Cache
invalidate
II5
No action;
stay in invalid state.
IV5 No action;
go to invalid state.
II6, No action;
II7 stay in invalid state.
IV6 No action;
go to invalid state.
Cache
push
IV7 No action;
stay in valid state.
8.7.13.1 Data Cache State Transitions
Using the V and M bits, the data cache supports a line-based protocol allowing individual
cache lines to be invalid, valid, or modified. To maintain memory coherency, the data cache
supports both write-through and copyback modes, specified by the corresponding
ACR[CM], or CACR[DDCM] if no ACR matches.
Read or write misses to copyback regions cause the cache controller to read a cache line
from memory into the cache. If available, tag and data from memory update an invalid line
in the selected set. The line state then changes from invalid to valid by setting the V bit. If
all lines in the row are already valid or modified, the pseudo-round-robin replacement
algorithm selects one of the four lines and replaces the tag and data. Before replacement,
modified lines are temporarily buffered and later copied back to memory after the new line
has been read from memory.
Figure 8-21 shows the three possible data cache line states and possible processor-initiated
transitions for memory configured as copyback. Transitions are labeled with a capital letter
indicating the previous state and a number indicating the specific case listed in Table 8-21.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-53
Freescale Semiconductor, Inc.
Cache Overview
CI5—DCINVA
CI6—CPUSHL & DDPI
CI7—CPUSHL & DDPI
CV1—CPU read miss
CV2—CPU read hit
CV7—CPUSHL & DDPI
CI1—CPU read miss
Freescale Semiconductor, Inc...
Invalid
V=0
Valid
V=1
M=0
CV5—DCINVA
CV6—CPUSHL & DDPI
CI3—CPU
write miss
CD1—CPU
read miss
CD7—CPUSHL
& DDPI
CD5—DCINVA
CV3—CPU write miss
CD6—CPUSHL & DDPI
CV4—CPU write hit
Modified
V=1
M=1
CD2—CPU read hit
CD3—CPU write miss
CD4—CPU write hit
Figure 8-21. Data Cache Line State Diagram—Copyback Mode
Figure 8-22 shows the two possible states for a cache line in write-through mode.
WV1—CPU read miss
WV2—CPU read hit
WV3—CPU write miss
WV4—CPU write hit
WV7—CPUSHL & DDPI
WI3—CPU write miss
WI5—DCINVA
WI6—CPUSHL & DDPI
WI7—CPUSHL & DDPI
WI1—CPU read miss
Invalid
V=0
Valid
V=1
WV5—DCINVA
WV6—CPUSHL & DDPI
Figure 8-22. Data Cache Line State Diagram—Write-Through Mode
Table 8-31 describes data cache line transitions and the accesses that cause them.
Table 8-31. Data Cache Line State Transitions
Previous State
Access
Invalid (V = 0)
Valid (V = 1, M = 0)
Modified (V = 1, M = 1)
Read
miss
CI1,
WI1
Read line from
memory and update
cache;
supply data to
processor;
go to valid state.
CV1,
WV1
Read new line from
memory and update
cache;
supply data to processor;
stay in valid state.
CD1 Push modified line to
buffer;
read new line from memory
and update cache;
supply data to processor;
write push buffer contents
to memory;
go to valid state.
Read hit
CI2,
WI2
Not possible.
CV2,
WV2
Supply data to processor;
stay in valid state.
CD2 Supply data to processor;
stay in modified state.
8-54
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
Table 8-31. Data Cache Line State Transitions (Continued)
Previous State
Access
Freescale Semiconductor, Inc...
Invalid (V = 0)
Valid (V = 1, M = 0)
Modified (V = 1, M = 1)
Write
miss
(copyback)
CI3
Read line from
memory and update
cache;
write data to cache;
go to modified state.
CV3
Read new line from
memory and update
cache;
write data to cache;
go to modified state.
CD3 Push modified line to
buffer;
read new line from memory
and update cache;
write push buffer contents
to memory;
stay in modified state.
Write
miss
(writethrough)
WI3
Write data to
memory;
stay in invalid state.
WV3
Write data to memory;
stay in valid state.
WD3 Write data to memory;
stay in modified state.
Cache mode changed for
the region corresponding to
this line. To avoid this state,
execute a CPUSHL
instruction or set
CACR[DCINVA,ICINVA]
before switching modes.
Write hit
(copyback)
CI4
Not possible.
CV4
Write data to cache;
go to modified state.
CD4 Write data to cache;
stay in modified state.
Write hit
(writethrough)
WI4
Not possible.
WV4
Write data to memory and
to cache;
stay in valid state.
WD4 Write data to memory and
to cache;
go to valid state.
Cache mode changed for
the region corresponding to
this line. To avoid this state,
execute a CPUSHL
instruction or set
CACR[DCINVA,ICINVA]
before switching modes.
Cache
invalidate
CI5,
WI5
No action;
stay in invalid state.
CV5,
WV5
No action;
go to invalid state.
CD5 No action (modified data
lost);
go to invalid state.
Cache
push
CI6,
CI7,
WI6,
WI7
No action;
stay in invalid state.
CV6,
WV6
No action; (DDPI = 0))
go to invalid state.
CD6 Push modified line to
memory; (DDPI = 0)
go to invalid state.
CV7,
WV7
No action; (DDPI = 1)
stay in valid state.
CD7 Push modified line to
memory; (DDPI = 1)
go to valid state.
The following tables present the same information as Table 8-31, organized by the previous
state of the cache line. In Table 8-32 the previous state is invalid.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-55
Freescale Semiconductor, Inc.
Cache Overview
Table 8-32. Data Cache Line State Transitions (Previous State Invalid)
Freescale Semiconductor, Inc...
Access
Response
Read miss
CI1,
WI1
Read line from memory and update cache;
supply data to processor;
go to valid state.
Read hit
CI2,
WI2
Not possible
Write miss (copyback)
CI3
Read line from memory and update cache;
write data to cache;
go to modified state.
Write miss (write-through)
WI3
Write data to memory;
stay in invalid state.
Write hit (copyback)
CI4
Not possible
Write hit (write-through)
WI4
Not possible
Cache invalidate
CI5,
WI5
No action;
stay in invalid state.
Cache push
CI6,
WI6
No action;
stay in invalid state.
Cache push
CI7,
WI7
No action;
stay in invalid state.
Table 8-33 shows transitions when the previous state is valid.
Table 8-33. Data Cache Line State Transitions (Previous State Valid)
Access
Response
Read miss
CV1,
WV1
Read new line from memory and update cache;
supply data to processor; stay in valid state.
Read hit
CV2,
WV2
Supply data to processor;
stay in valid state.
Write miss (copyback)
CV3
Read new line from memory and update cache;
write data to cache;
go to modified state.
Write miss (write-through)
WV3
Write data to memory;
stay in valid state.
Write hit (copyback)
CV4
Write data to cache;
go to modified state.
Write hit (write-through)
WV4
Write data to memory and to cache;
stay in valid state.
Cache invalidate
CV5,
WV5
No action;
go to invalid state.
Cache push
CV6,
WV6
No action;
go to invalid state.
Cache push
CV7,
WV7
No action;
stay in valid state.
Table 8-34 shows transitions when the previous state is modified.
8-56
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Cache Overview
Table 8-34. Data Cache Line State Transitions (Previous State Modified)
Access
Read miss
Read hit
Freescale Semiconductor, Inc...
Write miss
(copyback)
Write miss
(write-through)
Write hit
(copyback)
Write hit
(write-through)
Response
CD1 Push modified line to buffer;
read new line from memory and update cache;
supply data to processor;
write push buffer contents to memory;
go to valid state.
CD2 Supply data to processor;
stay in modified state.
CD3 Push modified line to buffer;
read new line from memory and update cache;
write push buffer contents to memory;
stay in modified state.
WD3 Write data to memory;
stay in modified state.
Cache mode changed for the region corresponding to this line. To avoid this state,
execute a CPUSHL instruction or set CACR[DCINVA,ICINVA] before switching modes.
CD4 Write data to cache;
stay in modified state.
WD4 Write data to memory and to cache;
go to valid state.
Cache mode changed for the region corresponding to this line. To avoid this state,
execute a CPUSHL instruction or set CACR[DCINVA,ICINVA] before switching modes.
Cache invalidate
CD5 No action (modified data lost);
go to invalid state.
Cache push
CD6 Push modified line to memory;
go to invalid state.
Cache push
CD7 Push modified line to memory;
go to valid state.
Chapter 8. Local Memory
For More Information On This Product,
Go to: www.freescale.com
8-57
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Cache Overview
8-58
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Chapter 9
Core Interface
9.1 Core Interface Signals
Figure 9-1 is a generic block diagram of a CF4e core interface.
E-Bus
S-Bus
System
Integration
Module
Slave
Module
Slave
Module
Master
Module
M-Bus
CF4e Core Kmem
CF4e Core
EMAC, FPU
Freescale Semiconductor, Inc...
This chapter describes the CF4e core interface and provides an overview of the functional
operation of the master bus (M-Bus).
Instruction K-Bus
Data K-Bus
CF4e
CPU
ICACHE
Ctrl
Debug
KRAM0
Ctrl
DCACHE
Ctrl
ICACHE
Tag/Data
Arrays
K2M
KROM0
Ctrl
KRAM1
Ctrl
KRAM0
Mem
Array
KRAM1
Ctrl
KROM0
Mem
Array
Data K-Bus
ICACHE
Tag/Data
Arrays
KRAM1
Mem
Array
KROM1
Mem
Array
Figure 9-1. Generic CF4e Block Diagram
The system buses have the following hierarchy:
•
•
•
•
K-Bus—Instruction and operand; processor core and dedicated on-chip memories
M-Bus—Internal multi-master with centralized arbitration
S-Bus—Slave module bus controlled by the SIM
E-Bus—External interface bus
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-1
Freescale Semiconductor, Inc.
CF4e Pin Characteristics
9.2 CF4e Pin Characteristics
Table 9-1 provides CF4e core pin characteristics. Most M-Bus and debug input signals are
driven directly into input capture registers in the CF4e core. K-Bus memories are specified
as synchronous; CF4e core outputs are next-state values registered in the memory device.
NOTE:
The letter ‘b’ at the end of a name indicates an active-low
signal.
Table 9-1. CF4e Pin Characteristics
Freescale Semiconductor, Inc...
No.
Type
Name
Bus Width1
Description
M-Bus Outputs
1
Output
maddr
[31:0]
M-Bus address
2
Output
mtt
[1:0]
M-Bus transfer type
3
Output
mtm
[2:0]
M-Bus transfer modifier
4
Output
mrwb
—
5
Output
msiz
[1:0]
M-Bus transfer size
6
Output
mwdata
[31:0]
M-Bus write data
7
Output
mwdataoe
—
M-Bus output enable
8
Output
mapb
—
M-Bus address phase
9
Output
mdpb
—
M-Bus data phase
10
Output
mlockb
—
M-Bus locked access
M-Bus read/write
Debug Outputs
11
Output
bdmforceackb
—
BDM force M-Bus acknowledge
12
Output
cpustopb
—
Processor is stopped
13
Output
cpuhaltb
—
Processor is halted
14
Output
pstclk
—
PST/DDATA clock
Test Outputs
15
Output
so
[42:0]
Core parallel scan outputs
16
Output
tbso
[7:0]
Test boundary scan outputs
17
Output
bistdone
—
18
Output
bistdata
3:0]
19
Output
bistfail
—
BIST memory failure indicator
20
Output
bisthold
—
BIST holding indicator
BIST done indicator
BIST bitmap data
Outputs to K-Bus Memories
21
Output
nsientb
—
Next-state I-cache tag enable
22
Output
nsiwrttb
—
Next-state I-cache tag write
23
Output
nsiwlvt
[3:0]
9-2
Next-state I-cache tag write level
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
CF4e Pin Characteristics
Freescale Semiconductor, Inc...
Table 9-1. CF4e Pin Characteristics (Continued)
No.
Type
Name
Bus Width1
24
Output
nsirowst
[9:0]
Next-state I-cache tag address
25
Output
nsiaddrt
[31:9]
Next-state I-cache tag data
26
Output
nsisw
—
Next-state I-cache tag written bit
27
Output
nsisv
—
Next-state I-cache tag valid bit
28
Output
nsiendb
—
Next-state I-cache data enable
29
Output
nsiwrtdb
[3:0]
Next-state I-cache data write level
30
Output
nsiwtbyted
[3:0]
Next-state I-cache data byte write
31
Output
nsirowsd
[11:0]
Next-state I-cache data address
32
Output
nsicwrdata
[31:0]
Next-state I-cache write data
33
Output
nsoentb
—
Next-state D-cache tag enable
34
Output
nsowrttb
—
Next-state D-cache tag write
35
Output
nsowlvt
[3:0]
Next-state D-cache tag write level
36
Output
nsorowst
[9:0]
Next-state D-cache tag address
37
Output
nsoaddrt
[31:9]
Next-state D-cache tag data
38
Output
nsosw
—
Next-state D-cache tag written bit
39
Output
nsosv
—
Next-state D-cache tag valid bit
40
Output
nsoendb
—
Next-state D-cache data enable
41
Output
nsowrtdb
[3:0]
Next-state D-cache data write level
42
Output
nsowtbyted
[3:0]
Next-state D-cache data byte write
43
Output
nsorowsd
[11:0]
Next-state D-cache data address
44
Output
nsocwrdata
[31:0]
Next-state D-cache write data
45
Output
kram0addr
[15:2]
Next-state KRAM0 address
46
Output
kram0di
[31:0]
Next-state KRAM0 write data
47
Output
kram0web
[3:0]
Next-state KRAM0 write enable
48
Output
kram0csb
—
49
Output
kram1addr
[15:2]
Next-state KRAM1 address
50
Output
kram1di
[31:0]
Next-state KRAM1 write data
51
Output
kram1web
[3:0]
Next-state KRAM1 write enable
52
Output
kram1csb
—
Description
Next-state KRAM0 chip select
Next-state KRAM1 chip select
Clock Inputs
53
Input
mclken
—
Clock phase relationship definer
54
Input
clkfast
—
Processor core clock
More Debug Inputs
55
Output
krom0csb
—
56
Output
krom0addr
[15:2]
Next-state KROM 0 chip select
Next-state KROM 0 address
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-3
Freescale Semiconductor, Inc.
CF4e Pin Characteristics
Table 9-1. CF4e Pin Characteristics (Continued)
No.
Type
Name
Bus Width1
57
Output
krom1csb
—
58
Output
krom1addr
[15:2]
Next-state KROM 1 address
59
Output
pstddata
[7:0]
Processor status and debug data
60
Output
dsdo
—
Development system data output
Description
Next-state KROM 1 chip select
M-Bus Inputs
Input
mrdata2
[31:0]
Input
mtab3
—
M-Bus transfer acknowledge
63
Input
mahb3
—
M-Bus address hold
64
Input
miplb4
[2:0]
61
Freescale Semiconductor, Inc...
62
M-Bus read data
M-Bus interrupt request priority level
Miscellaneous Control and Debug Inputs
Input
mrstib4
—
M-Bus reset
66
Input
dsclk5
—
Development system clock
67
Input
dsdi5
—
Development system data input
Input
mbkptb5
—
Development system breakpoint
65
68
Test and Cache Configuration Inputs
69
Input
mtmod
[2:0]
Test mode indicators
70
Input
bistrelease
—
71
Input
bistmemory
[2:0]
BIST memory select
72
Input
si
[31:0]
Core parallel scan inputs
73
Input
se
—
Core parallel scan enable
74
Input
tbsi
[3:0]
Test boundary scan inputs
75
Input
tbsei
—
Test boundary scan enable—inputs
76
Input
tbseo
—
Test boundary scan enable—outputs
77
Input
tbte
—
Test boundary pcell test enable
78
Input
icsize
[3:0]
Encoded I-cache size
79
Input
ocsize
[3:0]
Encoded D-cache size
BIST release data retention
Inputs from K-Bus Memories + Memory Configuration Definitions
80
Input
ictag3do
[31:9]
81
Input
icw3do
—
I-cache level 3 written bit output
82
Input
icv3do
—
I-cache level 3 valid bit output
83
Input
ictag2do
[31:9]
I-cache level 2 tag data output
84
Input
icw2do
—
I-cache level 2 written bit output
85
Input
icv2do
—
I-cache level 2 valid bit output
86
Input
ictag1do
[31:9]
I-cache level 1 tag data output
87
Input
icw1do
—
9-4
I-cache level 3 tag data output
I-cache level 1 written bit output
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
CF4e Pin Characteristics
Freescale Semiconductor, Inc...
Table 9-1. CF4e Pin Characteristics (Continued)
No.
Type
Name
Bus Width1
88
Input
icv1do
—
I-cache level 1 valid bit output
89
Input
ictag0do
[31:9]
I-cache level 0 tag data output
90
Input
icw0do
—
I-cache level 0 written bit output
91
Input
icv0do
—
I-cache level 0 valid bit output
92
Input
iclvl3do
[31:0]
I-cache level 3 data output
93
Input
iclvl2do
[31:0]
I-cache level 2 data output
94
Input
iclvl1do
[31:0]
I-cache level 1 data output
95
Input
iclvl0do
[31:0]
I-cache level 0 data output
96
Input
octag3do
[31:9]
D-cache level 3 tag data output
97
Input
ocw3do
—
D-cache level 3 written bit output
98
Input
ocv3do
—
D-cache level 3 valid bit output
99
Input
octag2do
[31:9]
D-cache level 2 tag data output
100
Input
ocw2do
—
D-cache level 2 written bit output
101
Input
ocv2do
—
D-cache level 2 valid bit output
102
Input
octag1do
[31:9]
D-cache level 1 tag data output
103
Input
ocw1do
—
D-cache level 1 written bit output
104
Input
ocv1do
—
D-cache level 1 valid bit output
105
Input
octag0do
[31:9]
D-cache level 0 tag data output
106
Input
ocw0do
—
D-cache level 0 written bit output
107
Input
ocv0do
—
D-cache level 0 valid bit output
108
Input
oclvl3do
[31:0]
D-cache level 3 data output
109
Input
oclvl2do
[31:0]
D-cache level 2 data output
110
Input
oclvl1do
[31:0]
D-cache level 1 data output
111
Input
oclvl0do
[31:0]
D-cache level 0 data output
112
Input
enspecialkram
—
113
Input
kram0size
[3:0]
Encoded KRAM0 size
114
Input
kram0do
[31:0]
KRAM0 data output
115
Input
kram1size
[3:0]
Encoded KRAM1 size
116
Input
kram1do
[31:0]
KRAM1 data output
117
Input
krom0size
[3:0]
Encoded KROM0 size
118
Input
krom0do
[31:0]
KROM0 data output
119
Input
krom0vldrst
—
KROM0 valid at reset
120
Input
krom1size
[3:0]
Encoded KROM1 size
Description
Enable special KRAM1 mapping
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-5
Freescale Semiconductor, Inc.
ColdFire Master Bus
Table 9-1. CF4e Pin Characteristics (Continued)
No.
Type
Name
Bus Width1
121
Input
krom1do
[31:0]
122
Input
krom1vldrst
—
Description
KROM1 data output
KROM1 valid at reset
1
Bus widths are specified using vector notation. A dash (—) in this column indicates a scalar (1-bit) signal.
The MRDATA[31:0] input capture register is only loaded by the termination of an M-Bus data phase.
3 Because mahb and mtab are driven into combinational logic before being registered, they have a greater setup
timing requirement.
4 miplb[2:0] and mrstib are routed into free-running input capture registers.
5 dsclk, dsdi, and mbkptb are routed into two levels of free-running registers that serve as synchronizers.
Freescale Semiconductor, Inc...
2
9.3 ColdFire Master Bus
The ColdFire architecture implements a hierarchy of buses to provide interconnection and
necessary bandwidth among system components such as processors and peripherals. The
M-Bus is the system interconnect between multiple masters (including processors) and the
system integration module (SIM). The SIM provides additional connectivity to an optional
internal S-Bus that contains on-chip peripheral modules and to the external system through
the E-Bus. The M-, S-, and E-Buses use a Motorola-defined bus protocol. Providing this
bus protocol support allows integration of devices at any level in the system.
The ColdFire architecture supports multiple clock frequency domains. A ColdFire
processor can operate at any integer multiple of the M-Bus clock frequency. The M-Bus
interface is the boundary from the processor’s clock domain to the M-Bus clock
domain.The following sections describe specific M-Bus protocols needed to support the
multiple clock domains and give system clocking guidelines.
9.3.1 M-Bus Signals
Table 9-2 defines required M-Bus signals. These signals are described as viewed by the bus
master. Although signal timings are referenced to the system clock, the system clock is not
considered a bus signal. Clock routing is expected to meet application requirements.
In this chapter, bus cycle refers to a request to transfer data between the bus master and a
slave device.
Table 9-2. M-Bus Signals
Name
Direction
maddr[31:0]
Out
mahb
In
mapb
Out
9-6
Description
Address bus. Address of the first item of a bus transfer for a normal bus cycle.
Address hold. Asserted to indicate that the address and attributes should be held. mahb
indicates that the SIM is not ready to accept the address phase of the bus cycle. mahb is
also used in bus arbitration to halt the master when it is not granted the M-Bus.
Address phase. Indicates that the address and attributes are being driven and that the
address phase of the bus cycle is active.
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ColdFire Master Bus
Freescale Semiconductor, Inc...
Table 9-2. M-Bus Signals (Continued)
Name
Direction
Description
mdpb
Out
Data phase. Indicates the data phase of the cycle is active. This means that the bus master
drives data during the cycle if the access is a write. During a read, data may be driven back
to the bus master. The bus cycle is always terminated during the data phase.
miplb[2:0]
In
mlockb
Out
mrdata[31:0]
In
Read data bus. Provides a read data path between the SIM and internal masters. The
32-bit read data bus can transfer 8, 16, or 32 bits of data per bus transfer. During a line
transfer, data lines are time-multiplexed across multiple cycles to carry 128 bits.
mrstib
In
M-Bus reset. Directs all M-Bus modules (including the core) to enter reset mode.
mrwb
Out
Read/write. Indicates the data transfer direction for the current bus cycle. A high level
indicates a read cycle and a low level indicates a write cycle.
msiz[1:0]
Out
Transfer size. Indicate the bus transfer data size.
00 Longword (4 bytes)
01 Byte (1 byte)
10 Word (2 bytes)
11 Line (16 bytes)
mtab
In
Interrupt priority level. Indicates the priority level of a pending interrupt request.
111 No interrupt pending
110 Level 1
101 Level 2
100 Level 3
011 Level 4
010 Level 5
001 Level 6
000 Level 7
Locked access. Indicates the current M-Bus cycle is part of a locked, or indivisible,
read-modify-write.
Transfer acknowledge. Asserted to indicate successful completion of a requested bus
transfer. The CF4e core also generates a debug output signal, bdmforceackb, to help break
a hung external bus condition. During debug, an incorrect reference to a memory address
may effectively hang the external bus because no slave device responds. In such cases, a
new serial BDM command can be sent into the debug module. After decoding this
command, the core asserts bdmforceackb for an entire M-Bus clock period. This output
may be factored into the external or M-Bus termination logic to unconditionally force a
transfer acknowledge so that debug can continue without a system reset.
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-7
Freescale Semiconductor, Inc.
ColdFire Master Bus
Freescale Semiconductor, Inc...
Table 9-2. M-Bus Signals (Continued)
Name
Direction
Description
mtm[2:0]
Out
Transfer modifier. Give supplemental information for each transfer type. For interrupt
acknowledge transfers, mtm[2:0] carry the interrupt level being acknowledged. For CPU
space transfers, mtm[2:0] are low.
When mtt[1:0] = 0x—Normal transfers
000 Reserved
001 User data access
010 User code access
011–100 Reserved
101 Supervisor data access
110 Supervisor code access
111 Reserved
When mtt[1:0] = 10—Processor emulator mode access
000–100, 111 Reserved
101 Emulator mode data access
110 Emulator mode code access
When mtt[1:0] = 11—Acknowledge or CPU space access
000 CPU space
001 Interrupt level 1 acknowledge
010 Interrupt level 2 acknowledge
011 Interrupt level 3 acknowledge
100 Interrupt level 4 acknowledge
101 Interrupt level 5 acknowledge
110 Interrupt level 6 acknowledge
111 Interrupt level 7 acknowledge
mtt[1:0]
Out
Transfer type. Indicates the type of access of the current bus cycle. The alternate master
access is used to indicate a non-core master is requesting the transfer.
00 Processor access
01 Alternate master access
10 Processor emulator mode access
11 Acknowledge or CPU space access
mwdata[31:0]
Out
Write data bus. Provides the write data path between an internal master and the SIM. The
write data bus is 32 bits wide and can transfer 8, 16, or 32 bits per bus transfer. During a
line transfer, the data lines are time-multiplexed across multiple cycles to carry 128 bits.
mwdataoe
Out
Write data bus output enable. ColdFire cores implement unidirectional read and write data
buses. If the system designer chooses to implement a bidirectional data bus, mwdataoe
can be used to control the three-state enable during write operations.”
9.3.2 M-Bus Operation
The two-stage, synchronous pipelined M-Bus has an effective bandwidth rate of up to one
transfer per clock.
9.3.2.1 Basic Bus Cycles
The bus transaction is split into two phases. During the address phase, the address
(maddr[31:0]) and attribute signals (msiz[1:0], mrwb, mtt[1:0], and mtm[2:0]) are driven.
The address phase signal (mapb) is asserted to show that the bus is in the address phase.
During the data phase, the data phase (mdpb) signal is asserted until the bus cycle
terminates with a transfer acknowledge (mtab). On a write cycle, the write data bus
(mwdata) is driven for the entire data phase. On a read cycle, the bus master samples the
read data bus (mrdata[31:0]) concurrently with mtab at the rising clock edge. Figure 9-2
9-8
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ColdFire Master Bus
shows basic read and write operations.
Read Cycle
Address
Phase
Data
Phase
Write Cycle
Address
Phase
Data
Phase
Clock
maddr[31:0]
and attributes
mapb
Freescale Semiconductor, Inc...
mahb
mdpb
mtab
mrwb
mrdata[31:0]
mwdata
Figure 9-2. Basic Read and Write Cycles
9.3.2.2 Pipelined Bus Cycles
Because the bus is pipelined, the address phase of the next bus cycle can become valid while
the data phase of the current bus cycle is still valid. Address and data phases cannot be
concurrently valid for the same bus cycle. Figure 9-3 shows two basic pipelined bus cycles.
For illustration purposes, a read and write cycle are used in Figure 9-3. There are no
restrictions on cycles being either reads or writes for them to be pipelined.
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-9
Freescale Semiconductor, Inc.
ColdFire Master Bus
Read Cycle
Write Cycle
Address
Phase
Data
Phase
Address
Phase
Data
Phase
Clock
maddr[31:0]
and attributes
mapb
Freescale Semiconductor, Inc...
mahb
mdpb
mtab
mrwb
mrdata[31:0]
mwdata
Figure 9-3. Pipelined Read and Write
9.3.2.3 Address and Data Phase Interactions
Bus timing, performance, and arbitration are controlled by handling the address and data
phases of the bus cycle. The general rules for controlling the phases include the following:
1. The address phase is allowed to begin when there is no active address phase.
2. The address phase is allowed to end and the data phase to begin when the address
hold (mahb) signal is not asserted and there is either no active data phase or the
active data phase is being terminated.
3. The data phase is allowed to end when the cycle is terminated with mtab.
4. This rule, which is a restriction on the rule 2, applies only to ColdFire processor
masters running at the same frequency as the M-Bus, that is, processor and M-Bus
clock domains have the same frequency—1X clock mode.
In 1X clock mode, the processor’s address phase is allowed to end and the data phase
is allowed to begin when the following conditions are met:
— Address hold (mahb) is not asserted
— There is either no active data phase or the active data phase is not from this
processor and is being terminated.
That is, for a ColdFire processor operating in 1X clock mode, there must be one
M-Bus cycle where that processor’s data phase is inactive before its active address
phase can progress to a data phase.
9-10
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ColdFire Master Bus
The implications of the general bus rules are as follows:
•
•
•
The bus master is held off (usually for bus arbitration) by asserting mahb to ensure
that the address and attributes remain valid and that the data phase is not entered.
Pipelining is accomplished by allowing the next address phase to begin during the
data phase as soon as the next address is available.
Wait states are introduced by withholding the termination signal mtab.
The implications of the special 1X clock mode rule are as follows:
Freescale Semiconductor, Inc...
•
•
If a ColdFire processor operating in 1X clock mode has both an active address phase
and an active data phase, the M-Bus control module must assert mahb on the last
M-Bus transfer acknowledge. This forces the ColdFire processor to hold its address
phase until its data phase is idle for at least one cycle.
A simple implementation of this 1X clock mode rule is to connect mtab from the
SIM to both the mtab and mahb inputs ports of the CF4e core design.
Figure 9-4 shows mapb and mahb asserted during the same clock. The address phase is held
until mahb is negated, when mdpb is asserted to show the start of data phase 1. Because the
address for the next bus cycle is available, mapb stays asserted indicating the start of the
address phase 2. A wait state is inserted by delaying mtab until the next clock. In this case,
mapb is negated after termination because no other address is available from the bus master.
mdpb is not negated because at termination data phase 2 begins. Because the termination
signal remains asserted, data phase 2 is only one clock long.
Data
Phase 1
Address Phase 1
Data
Phase 2
Address Phase 2
Clock
maddr[31:0]
and attributes
mapb
mahb
mdpb
mtab
Figure 9-4. Address Hold Followed By 1- and 0-Wait State Cycles
Figure 9-5 shows that mapb can be generated in the center of the data phase. It also shows
that mahb may be generated while a data phase is active. In this case, the current data phase
is completed, but the next cycle is not allowed to transition to the data phase.
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-11
Freescale Semiconductor, Inc.
ColdFire Master Bus
Data
Phase 2
Data Phase 1
Address
Phase 1
Address Phase 2
Clock
maddr[31:0]
and attributes
mapb
mahb
Freescale Semiconductor, Inc...
mdpb
mtab
Figure 9-5. mapb and mahb Generated Mid-Data Phase
Figure 9-6 shows the special rule for 1X clock mode: mahb holds a 1X clock-mode
processor in its address phase (address phase 2) on the last mtab of data phase 1.
Data
Phase 2
Data Phase 1
Address
Phase 1
Address Phase 2
Clock
maddr[31:0]
and attributes
mapb
mahb
mdpb
mtab
Figure 9-6. mahb Generation for 1X Clock Mode
9.3.2.4 Data Size Operations
Table 9-3 shows operand designations for transfers on a byte-boundary system.
Table 9-3. Processor Operand Representation
9-12
Format
Bits[31:24]
Bits[23:16]
Bits[15:8]
Bits[7:0]
Longword Operand
OP0
OP1
OP2
OP3
Word Operand
—
—
OP2
OP3
Byte Operand
—
—
—
OP3
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ColdFire Master Bus
A bus cycle is a request to transfer data between the bus master and a slave device. To
ensure that master and slave devices can handle misaligned operands, the bus architecture
must guarantee that each data byte is aligned to the proper lane. For line transfers, data
alignment is treated as 4 longword transfers. The next section discusses protocols to handle
these transfers. M-Bus transfers assume 32-bit M-Bus devices. The SIM generally handles
dynamic sizing to byte or word ports; to support this, M-Bus masters must perform some
data replication functions during write cycles. For all transfers, maddr[31:2] is the
longword base address of the first byte of the reference item. maddr[1:0] indicates the byte
offset from this address. msiz[1:0] and the 2 low-order address bits determine data bus
usage. Table 9-4 shows mrdata requirements for read transfers.
Freescale Semiconductor, Inc...
Table 9-4. mrdata Requirements for Read Transfers
Size
Byte
msiz[1:0] maddr[1:0] mrdata[31:24] mrdata[23:16] mrdata[15:8] mrdata[7:0]
01
00
OP3
Ignored
Ignored
Ignored
01
01
Ignored
OP3
Ignored
Ignored
01
10
Ignored
Ignored
OP3
Ignored
01
11
Ignored
Ignored
Ignored
OP3
10
00
OP2
OP3
Ignored
Ignored
10
10
Ignored
Ignored
OP2
OP3
Long
00
00
OP0
OP1
OP2
OP3
Line
11
00
OP0
OP1
OP2
OP3
Word
Table 9-5 shows mwdata requirements for write transfers.
Table 9-5. mwdata Bus Requirements for Write Transfers
Size
Byte
msiz[1:0] maddr[1:0] mwdata[31:24] mwdata[23:16] mwdata[15:8] mwdata[7:0]
01
00
OP3
Ignored
Ignored
Ignored
01
01
OP3
OP3
Ignored
Ignored
01
10
OP3
Ignored
OP3
Ignored
01
11
OP3
Ignored
Ignored
OP3
10
00
OP2
OP3
Ignored
Ignored
10
10
OP2
OP3
OP2
OP3
Long
00
00
OP0
OP1
OP2
OP3
Line
11
00
OP0
OP1
OP2
OP3
Word
Table 9-4 and Table 9-5 define all allowable msiz[1:0] and maddr[1:0] combinations.
9.3.2.5 Line Transfers
A line is defined as being 16 bytes wide, aligned in memory on 0-modulo-16 address
boundary. On the M-Bus, this is seen as an address phase followed by a data phase during
which 4 longwords of data are transferred a longword per transfer. Although the line is
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-13
Freescale Semiconductor, Inc.
ColdFire Master Bus
aligned on 16-byte boundaries, a line access does not necessarily begin on a 0-modulo-16
address. It can begin at any aligned longword address with maddr[1:0] = 00. Therefore, a
slave system (combination of the SIM, modules, and external devices) must be able to cycle
through the longword addresses. Table 9-6 shows allowable patterns during line accesses.
Freescale Semiconductor, Inc...
Table 9-6. Allowable Line Access Patterns
maddr[3:2]
Longword Accesses
00
0x0–0x4–0x8–0xC
01
0x4–0x8–0xC–0x0
10
0x8–0xC–0x0–0x4
11
0xC–0x0–0x4–0x8
Figure 9-7 and Figure 9-8 show line access reads. Note that an address phase for the next
bus cycle can be initiated any time during the data phase. Also note that address hold can
be asserted during this time without affecting the data phase of a line access. Line accesses
complete and the address is held before the next data phase is allowed.
Clock
maddr[31:0]
and attributes
mapb
mahb
mdpb
mtab
mrwb
mrdata
Figure 9-7. LIne Access Read with Zero Wait States
Figure 9-8 shows a line access read with one wait state.
9-14
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ColdFire Master Bus
Clock
maddr[31:0]
and attributes
mapb
mahb
mdpb
Freescale Semiconductor, Inc...
mtab
mrwb
mrdata
Figure 9-8. Line Access Read with One Wait State
Figure 9-9 and Figure 9-10 show line write accesses. Note that the next longword of data
is available on the clock immediately after termination. There may be cases where data may
be pipelined to the external bus by terminating the access and registering the data in the SIM
during the first clock of the data phase. This allows the next longword of data to be available
at the next rising clock edge.
Clock
maddr[31:0]
and attributes
mapb
mahb
mdpb
mtab
mrwbb
mwdata
Figure 9-9. Line Access Write with Zero Wait States
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-15
Freescale Semiconductor, Inc.
ColdFire Master Bus
Clock
maddr[31:0]
and attributes
mapb
mahb
mdpb
Freescale Semiconductor, Inc...
mtab
mrwb
mwdata
Figure 9-10. Line Access Write with One Wait State
9.3.2.6 Bus Arbitration
The arbitration block provides a multiplexed bus scheme to handle multiple M-Bus
masters. Multiple masters cannot be on the same physical bus. Figure 9-11 shows the top
level architecture of a two-master, multiplexed M-Bus system. The address, attributes,
write data, mapb, and mdpb are multiplexed to the SIM. The current bus master’s signals
are muxed onto the common bus. The termination and address hold signals are
demultiplexed and routed to the appropriate bus master. Reset signals and read data need
not be multiplexed. Arbitration logic generates address hold to stall a device that is not the
current bus master.
The multiplexing scheme was adopted to accommodate a standard cell methodology. There
are no three-state or bidirectional signals on the bus, so adding bus masters complicates
multiplexing and may affect timing. Designs should limit the number of M-Bus masters.
For instance, a three-channel DMA is preferable to three DMA modules on the M-Bus.
Bus Master
#1
M-Bus #1
Bus Arbitration
And
Multiplexing
M-Bus #2
Bus Master
#2
Common
M-Bus
S-Bus
System
Bus
Controller
External
Bus
Figure 9-11. Multiplexed M-Bus Structure
Figure 9-11 shows waveforms with two bus masters multiplexed onto a common M-Bus.
The exact arbitration scheme and relative priority of bus masters is determined by the
9-16
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ColdFire Master Bus
arbitration logic implemented.
Clock
Bus Master #1
maddr1[31:0]
and attributes
map1b
mah1b
mdp1b
mwdata1
Bus Master #2
maddr2[31:0]
and attributes
map2b
mah2b
mdp2b
mta2b
mwdata2
Common M Bus
maddr[31:0]
and attributes
mapb
mdpb
mtab
mrdata[[31:0]
mwdata
Mux Control
Freescale Semiconductor, Inc...
mta1b
sel_a_1
sel_d_1
Figure 9-12. Multiplexed M-Bus Operation
Here, bus master #1 is the default master, such as a core processor. Its mahb is normally
high giving it bus access as needed. Bus master #2, a DMA controller for example, requests
Chapter 9. Core Interface
For More Information On This Product,
Go to: www.freescale.com
9-17
Freescale Semiconductor, Inc.
ColdFire Master Bus
the bus by asserting its mapb. The arbiter responds by asserting mah1b to hold off bus
master #1. It also transitions sel_a_1 (the mux control signal for address, attributes) and
mapb. Because an active data phase is on the bus, the data portion of the bus cannot be
muxed until that cycle terminates. When sel_d_1, the mux control for mwdata, mdpb, and
mtab are toggled. Bus master #2 runs its cycle on the common bus, then control returns to
bus master #1.
Freescale Semiconductor, Inc...
NOTE:
There is no need to multiplex mrdata. Because the bus master
samples data when the data phase is terminated, control of the
termination signal is sufficient.
9.3.2.7 Interrupt Support
Interrupts are supported on the M-Bus by miplb[2:0] and interrupt acknowledge cycles.
When an interrupt is pending, the SIM is responsible for driving miplb[2:0] to the processor
to request interrupt processing. The interrupted processor runs an acknowledge cycle to
request the interrupt vector to begin exception processing. The interrupt acknowledge cycle
looks like a standard byte read cycle. For this cycle, the mtt[1:0] signals indicate an
acknowledge cycle (mtt[1:0] = 11) and the interrupt level of the interrupt being processed
is specified in the mtm[2:0] signals. Additionally, the address lines maddr[31:5] are all
driven high, the interrupt level is reflected on maddr[4:2], and the lower two address bits,
maddr[1:0], are zero. The 8-bit interrupt vector is returned on mrdata[31:24].
9.3.2.8 Reset Operation
When a master is reset (mrstib is driven low), its M-Bus control signals are driven inactive.
This means that mapb, mdpb, mrwb, and mtab are all driven high. However, whether mahb
is driven high or low depends on the implementation.
9-18
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 10
Memory Management Unit (MMU)
This chapter describes the ColdFire virtual memory management unit (MMU), which
provides virtual-to-physical address translation and memory access control. The MMU
consists of memory-mapped control, status, and fault registers that provide access to
translation-lookaside buffers (TLBs). Software can control address translation and access
attributes of a virtual address by configuring MMU control registers and loading TLBs.
With software support, the MMU provides demand-paged, virtual addressing.
10.1 Features
The MMU has the following features:
•
•
•
•
MMU memory-mapped control, status, and fault registers
— Support a flexible, software-defined virtual environment
— Provide control and maintenance of TLBs
— Provide fault status and recovery information functions
Separate, 32-entry, fully associative instruction and data TLBs (Harvard TLBs)
— Resides in the K-Bus controller
— Operates in parallel with the K-Bus memories
— Suffers no performance penalty on TLB hits
— Supports 1-, 4-, and 8-Kbyte and 1-Mbyte page sizes concurrently
— Contains register-based TLB entries
Core extensions:
— User stack pointer
— All K-Bus access error exceptions are precise and recoverable
Harvard TLB provides 97% of baseline performance on large embedded
applications using equivalent V4 without MMU support as a baseline.
10.2 Virtual Memory Management Architecture
The ColdFire memory management architecture provides a demand-paged, virtual-address
environment with hardware address translation acceleration. It supports supervisor/user,
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-1
Freescale Semiconductor, Inc.
Virtual Memory Management Architecture
read, write, and execute permission checking on a per-memory request basis. The optional
MMU is placed with a software-controlled TLB and associated logic in the core at the
K-Bus level of the ColdFire memory/bus hierarchy. Other changes better isolate supervisor
and user modes and make core access error exceptions precise.
Freescale Semiconductor, Inc...
The architecture defines the MMU TLB, associated control logic, TLB hit/miss logic,
address translation based on the TLB contents, and access faults due to TLB misses and
access violations. It intentionally leaves some virtual environment details undefined to
maximize the software-defined flexibility. These include the exact structure of the
memory-resident pointer descriptor/page descriptor tables, the base registers for these
tables, the exact information stored in the tables, the methodology (if any) for maintenance
of access, and written information on a per-page basis.
10.2.1 MMU Architecture Features
To add optional virtual addressing support, demand-page support, permission checking,
and hardware address translation acceleration to the ColdFire architecture, the MMU
architecture features the following:
•
•
•
•
•
Addresses from the core to the K-Bus are treated as physical or virtual addresses.
The address access control logic, address attribute logic, K-Bus memories, and
K-Bus to M-Bus controller function as in previous ColdFire versions with the
addition of the MMU. The MMU, its TLB, and associated control reside in the
K-Bus logic.
The MMU appears as a memory-mapped device in the K-Bus space. Information for
access error fault processing is stored in the MMU.
A precise K-Bus fault (transfer error acknowledge) signals the core on translation
(TLB miss) and access faults. The core supports an instruction restart model for this
fault class. Note that this structure uses the existing ColdFire access error fault
vector and needs no new ColdFire exception stack frames.
The following additions are made to the K-Bus memory access control to better
support the fault processing and memory maintenance necessary for this virtual
addressing environment. These additions improve K-Bus memory performance and
functionality for physical and virtual address environments:
— New supervisor-protect bits to the access control registers (ACRs) and the cache
control register (CACR)
— Improved addressing of the ACRs
10.2.2 MMU Architectural Location
Figure 10-1 shows the placement of the MMU/TLB hardware. It follows a traditional
model in which it is closely coupled to the processor local-memory controllers.
10-2
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Virtual Memory Management Architecture
IFP
J
IAG
Branch
Cache
KC1
IC1
Instruction
Memory
KC2
IC2
Branch
Accel.
Physical
KC1
IED
Freescale Semiconductor, Inc...
IB
Memory
Management
Unit
(MMU)
OEP
DS
Physical
KC1
DS
J
OAG
Data
Memory
KC1
OC1
KC2
OC2
EX
M Bus
K2M
Mis
EMAC
FPU
DA
BDM
DSCLK DSI
DSDO
DDATA
PSTDDATA PSTCLK
Figure 10-1. CF4e Processor Core Block with MMU
10.2.3 MMU Architecture Implementation
This section describes ColdFire design additions and changes for the MMU architecture. It
includes precise faults, MMU access, virtual mode, virtual memory references, instruction
and data cache addresses, supervisor/user stack pointers, access error stack frame additions,
expanded control register space, ACR address improvements, supervisor protection, and
debugging in a virtual environment.
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-3
Freescale Semiconductor, Inc.
Virtual Memory Management Architecture
10.2.3.1 Precise Faults
The MMU architecture performs virtual-to-physical address translation and permission
checking in the core on the K-Bus interface. To support demand-paging, the core design
provides a precise, recoverable fault for all K-Bus references.
10.2.3.2 MMU Access
Freescale Semiconductor, Inc...
The MMU TLB control registers are memory-mapped. The TLB entries are read and
written indirectly through the MMU control registers. The memory space for these
resources is defined by a new supervisor program model register, the MMU base address
register (MMUBAR). This register defines a supervisor-mode, data-only space. It has the
highest priority for the data K-Bus address mode determination.
10.2.3.3 Virtual Mode
Every K-Bus instruction and data reference is either a virtual or physical address mode
access. All addresses for special mode (that is, interrupt acknowledges, emulator mode
operations, and so on) accesses are physical. All addresses are physical if the optional
MMU is not present or not enabled. If the MMU is present and enabled, the address mode
for normal accesses is determined by the MMUBAR, RAMBARs, ROMBARs, and ACRs
in normal priority order. Addresses that hit in the MMUBAR, RAMBARs, ROMBARs, and
ACRs are treated as physical references. These addresses are not translated and their
address attributes are sourced from the highest priority mapping register they hit. If an
address hits none of these mapping registers, it is a virtual address and is sent to the MMU.
If the MMU enabled, the default CACR information is not used.
10.2.3.4 Virtual Memory References
The ColdFire MMU architecture references the MMU for all virtual mode accesses to the
K-Bus. MMU, KRAM, KROM, and ACR memory spaces are treated as physical address
spaces and all permissions that apply to these spaces are contained in the respective
mapping register. The virtual mode access either hits or misses in the TLB of the MMU. A
TLB miss generates an access fault in the processor, allowing software to either load the
appropriate translation into the TLB and restart the faulting instruction or abort the process.
Each TLB hit checks permissions based on the access control information in the referenced
TLB entry.
10.2.3.5 Instruction and Data Cache Addresses
For a given page size, virtual address bits that reference within a page are called the in-page
address. All bits above this are the virtual page number. Likewise, the physical address has
a physical page number and in-page address bits. Virtual and physical in-page address bits
are the same; the MMU translates the virtual page number to the physical page number.
Instruction and data caches are accessed with the untranslated K-Bus address. The
translated address is used for cache allocation. That is, caches are virtual-address accessed
10-4
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Virtual Memory Management Architecture
and physical-address tagged. If instruction and data cache addresses are not larger than the
in-page address for the smallest active MMU page, the cache is considered physically
accessed, but if they are larger, the cache can have aliasing problems between virtual and
cache addresses. Software handles these problems by forcing the virtual address to be equal
to the physical address for those bits addressing the cache but above the in-page address of
the smallest active page size. The number of these bits depends on cache and page sizes.
Freescale Semiconductor, Inc...
Caches are addressed with the virtual address because the cache uses synchronous memory
elements, and an access starts at the rising-clock edge of the first K-Bus pipeline stage. The
MMU provides a physical address midway through this cycle.
If the cache set address has fewer bits than the in-page address, the cache is considered
physically addressed because these bits are the same in the virtual and physical addresses.
If the cache set address has more bits than the in-page address, one or more of the low-order
virtual page number bits are used to address the cache. The MMU translates these bits; the
resulting low-order physical page number bits are used to determine cache hits.
Address aliasing problems occur when two virtual addresses access one physical page. This
is generally allowed and, if the page is cacheable, one coherent copy of the page image is
mapped in the cache at any time.
If multiple virtual addresses pointing to the same physical address differ only in the
low-order virtual page number bits, conflicting copies can be allocated. For an 8-Kbyte,
4-way set-associative cache with a 16-byte line size, the cache set address uses address bits
10–4. If virtual addresses 0x0_1000 and 0x0_1400 are mapped to physical address
0x0_1000, using virtual address 0x0_1000 loads cache set 0x00, while using virtual
address 0x0_1400 loads cache set 0x40. This puts two copies of the same physical address
in the cache making this memory space not coherent. To avoid this problem, software must
force low-order virtual page number bits to be equal to low-order physical address bits for
all bits used to address the cache set.
10.2.3.6 Supervisor/User Stack Pointers
To isolate supervisor and user modes, CF4e implements two A7 register stack pointers, one
for supervisor mode and one for user mode. Two former M680x0 privileged instructions to
load and store the user stack pointer are restored in the instruction set architecture.
10.2.3.7 Access Error Stack Frame
K-Bus accesses that fault (that is, terminate with a K-Bus transfer error acknowledge)
generate an access error exception. MMU TLB misses and access violations use the same
fault. To quickly determine if a fault was due to a TLB miss or another type of access error,
new fault status field (FS) encodings signal TLB misses on the following:
•
•
•
Instruction fetch
Instruction extension fetch
Data read
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-5
Freescale Semiconductor, Inc.
Virtual Memory Management Architecture
•
Data write
See Section 10.4.3, “Access Error Stack Frame Additions,” for more information.
10.2.3.8 Expanded Control Register Space
MMUBAR is added for ColdFire virtual mode. Like other control registers, it can be
accessed from the debug module or written using the privileged MOVEC instruction. See
Chapter 2, “Registers.”
Freescale Semiconductor, Inc...
10.2.3.9 Changes to ACRs and CACR
New ACR and CACR bits, Table 10-1, improve address granularity and supervisor mode
protection. These improvements are not necessary to implement the ColdFire MMU but
they improve K-Bus memory functionality for physical and virtual address environments.
Table 10-1. New ACR and CACR Bits
Bits
Name
Description
ACRn[10]
AMM
Address mask mode. Determines access to the associated address space.
0 The ACR hit function is the same as previous versions, allowing control of a 16-Mbyte or
greater memory region.
1 The upper 8 bits of the address and ACR are compared without a mask function; bits 23–20
of the address and ACR are compared masked by ACR[19–16], allowing control of a 1- to
16-Mbyte region.
Reset value is 0.
ACRn[3]
SP
Supervisor protect. Determines access to the associated address space.
0 Supervisor and user access allowed.
1 Only supervisor access allowed. Attempted user access causes an access error exception.
Reset value is 0.
CACR[23] DDSP Default data supervisor protect. Determines access to the associated data space.
0 Supervisor and user access allowed.
1 Only supervisor access allowed. Attempted user access causes an access error exception.
Reset value is 0.
CACR[7]
DISP
Default instruction supervisor protect. Determines access to the associated instruction space.
0 Supervisor and user access allowed.
1 Only supervisor access allowed. Attempted user access causes access error exception
Reset value is 0.
10.2.3.10 ACR Address Improvements
ACRs provide a 16-Mbyte address window. For a given request address, if the ACR is valid
and the request mode matches the mode specified in the supervisor mode field, ACRn[S],
hit determination is specified as follows:
ACRx_Hit = 0;
if ((address[31:24] & ~ACRn[23:16]) == (ACRn[31:24] & ~ACRn[23:16]))
ACRx_Hit = 1;
With this hit function, ACRs can assign address attributes for user or supervisor requests to
memory spaces of at least 16 MBytes (through the address mask). With the MMU
definition, the ACR hit function is improved by the address mask mode bit (ACRn[AMM]),
which supports finer address granularity. See Table 10-1.
10-6
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Debugging in a Virtual Environment
The revised hit determination becomes the following:
ACRx_Hit = 0;
if (ACRn[10] == 1)
if ((address[31–24] == ACRn[31–24])) &&
((address[23–20] & ~ACRn[19–16]) == (ACRn[23–20] & ~ACRn[19–16])))
ACRx_Hit = 1;
elseif (address[31–24] & ~ACRn[23–16]) == (ACRn[31–24] & ~ACRn[23–16]))
ACRx_Hit = 1;
10.2.3.11 Supervisor Protection
Freescale Semiconductor, Inc...
Each K-Bus instruction or data reference is either a supervisor or user access. The CPU’s
status register supervisor bit (SR[S]) determines the operating mode. New ACR and CACR
bits protect supervisor space. See Table 10-1.
10.3 Debugging in a Virtual Environment
To support debugging in a virtual environment, numerous enhancements are implemented
in the ColdFire debug architecture. These enhancements are collectively called Debug
revision D and primarily relate to the addition of an 8-bit address space identifier (ASID)
to yield a 40-bit virtual address. This expansion affects two major debug functions:
•
•
The ASID is optionally included in the hardware breakpoint registers specification.
For example, the four PC breakpoint registers are expanded by 8 bits each, so that a
specific ASID value can be part of the breakpoint instruction address. Likewise, data
address/data breakpoint registers are expanded to include an ASID value. The new
control registers define whether and how the ASID is included in the breakpoint
comparison trigger logic.
The debug module implements the concept of ownership trace in which an ASID
value can be optionally displayed as part of real-time trace. When enabled, real-time
trace displays instruction addresses on any change-of-flow instruction that is not
absolute or PC-relative. For Debug revision D architecture, the address display is
expanded to optionally include ASID contents, thus providing the complete
instruction virtual address on these instructions. Additionally, when a Sync_PC
serial BDM command is loaded from the external development system, the
processor displays the complete virtual instruction address, including the 8-bit ASID
value.
The MMU control registers are accessible through serial BDM commands. See Chapter 11,
“Debug Support.”
10.4 Virtual Memory Architecture Processor Support
To support the MMU, enhancements have been made to the exception model, the stack
pointers, and the access error stack frame.
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-7
Freescale Semiconductor, Inc.
Virtual Memory Architecture Processor Support
10.4.1 Precise Faults
To support demand-paging, all memory references require precise, recoverable faults. The
ColdFire instruction restart mechanism ensures that a faulted instruction restarts from the
beginning of execution; that is, no internal state information is saved when an exception
occurs and none is restored when the handler ends. Given the PC address defined in the
exception stack frame, the processor reestablishes program execution by transferring
control to the given location as part of the RTE (return from exception) instruction.
For a detailed description, see Section 7.5, “Precise Faults.”
Freescale Semiconductor, Inc...
10.4.2 Supervisor/User Stack Pointers
To provide the required isolation between these operating modes as dictated by a virtual
memory management scheme, a user stack pointer (A7-USP) is added. The appropriate
stack pointer register (SSP, USP) is accessed as a function of the processor’s operating
mode. In addition, the following two privileged MC680x0 instructions to load/store the
USP are added to the ColdFire instruction set architecture:
mov.l
mov.l
Ay,USP
USP,Ax
# move to
USP: opcode = 0x4E6{0-7}
# move from USP: opcode = 0x4E6{8–F}
The address register number is encoded in the low-order three bits of the opcode.
These instructions are described in detail in Section 10.7, “MMU Instructions.”
10.4.3 Access Error Stack Frame Additions
ColdFire exceptions generate a standard 2-longword stack frame, signaling the contents of
the SR and PC at the time of the exception, the exception type, and a 4-bit fault status field.
FS. The first longword contains the 16-bit format/vector word (F/V) and the 16-bit status
register. The second contains the 32-bit PC address of the faulted instruction.
31
A7 →
28
Format
27
26
FS[3–2]
+ 0x04
25
18
Vector[7–0]
17
16
15
FS[1–0]
0
Status Register
Program Counter [31–0]
Figure 10-2. Exception Stack Frame
The FS field is used for access and address errors. To optimize TLB miss exception
handling, new FS encodings (Table 10-2) allow quick error classification.
Table 10-2. Fault Status Encodings
FS
0000
0001, 001x
10-8
Definition
Not an access or address error
Reserved
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
MMU Definition
Table 10-2. Fault Status Encodings (Continued)
Freescale Semiconductor, Inc...
FS
Definition
0100
Error (for example, protection fault) on instruction fetch
0101
TLB miss on opword of instruction fetch (New in CF4e)
0110
TLB miss on extension word of instruction fetch (New in CF4e)
0111
IFP access error while executing in emulator mode (New in CF4e)
1000
Error on data write
1001
Attempted write of protected space
1010
TLB miss on data write (New in CF4e)
1011
Reserved
1100
Error on data read
1101
Attempted read, read-modify-write of protected space (New in CF4e)
1110
TLB miss on data read, or read-modify-write (New in CF4e)
1111
OEP access error while executing in emulator mode (New in CF4e)
10.5 MMU Definition
The ColdFire MMU provides a virtual address, demand-paged memory architecture. The
MMU supports hardware address translation acceleration using software-managed TLBs.
It enforces permission checking on a per-memory request basis, and has control, status, and
fault registers for MMU operation.
10.5.1 Effective Address Attribute Determination
The ColdFire core generates an effective memory address for all instruction fetches and
data read and write memory accesses. The previous ColdFire memory access control model
is based strictly on physical addresses. Every memory request address is a physical address
that is analyzed by this logic and assigned address attributes, which include the following:
•
•
•
•
•
Cache mode
K-Bus RAM (KRAM) enable information
K-Bus ROM (KROM) enable information
Write protect information
Write mode information
These attributes control processing of the memory request. The address itself is not affected
by memory access control logic.
Instruction and data references base effective address attributes and access mode on the
instruction type and the effective address. K-Bus accesses are of the following two types:
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-9
Freescale Semiconductor, Inc.
MMU Definition
Freescale Semiconductor, Inc...
•
•
Special mode accesses, including interrupt acknowledges, reads/writes to
program-visible control registers (such as CACR, ROMBARs, RAMBARs, and
ACRs), cache control commands (CPUSHL and INTOUCH), and emulator mode
operations. These accesses have specific attributes such as the following:
— Non-cacheable
— Precise
— No write protection
Unless the CPU space/IACK mask bit is set, interrupt acknowledge cycles and
emulator mode operations are allowed to hit in RAMBARs and ROMBARs. All
other operations are normal mode accesses.
Normal mode accesses. For these accesses, an effective cache mode, precision and
write-protection are calculated for each request.
For the data K-Bus, a normal mode access address is compared with the following priority,
from highest to lowest: RAMBAR0, RAMBAR1, ROMBAR0, ROMBAR1, ACR0, and
ACR1. If no match is found, default attributes in the CACR are used. The priority for
instruction K-Bus accesses is RAMBAR0, RAMBAR1, ROMBAR0, ROMBAR1, ACR2,
and ACR3. Again, if no match is found, default CACR attributes are used.
Only the test-and-set (TAS) instruction can generate a normal mode access with implied
cache mode and precision. TAS is a special, byte-sized, read-modify-write instruction used
in synchronization routines. A TAS data access that does not hit in the RAMBARs is
non-cacheable and precise. TAS uses the normal effective write protection.
The ColdFire MMU is an optional enhancement to the memory access control. If the MMU
is present and enabled, it adds two factors for calculating effective address attributes:
•
•
MMUBAR defines a memory-mapped, privileged data-only space with the highest
priority in effective address attribute calculation for the data K-Bus (that is, the
MMUBAR has priority over RAMBAR0).
If virtual mode is enabled, any normal mode access that does not hit in the
MMUBAR, RAMBARs, ROMBARs, or ACRs is considered a normal mode virtual
address request and generates its access attributes from the MMU. For this case, the
default CACR address attributes are not used.
The MMU also uses TLB contents to perform virtual-to-physical address translation.
10.5.2 MMU Functionality
The MMU provides virtual-to-physical address translation and memory access control. The
MMU consists of memory-mapped, control, status, and fault registers and a TLB that can
be accessed through MMU registers. Supervisor software can access these resources
through MMUBAR. Software can control address translation and access attributes of a
virtual address by configuring MMU control registers and loading the MMU’s TLB, which
functions as a cache, associating virtual addresses to corresponding physical addresses and
10-10
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
MMU Definition
providing access attributes. Each TLB entry maps a virtual page. Several page sizes are
supported. Features such as clear all and probe for hit help maintain TLBs.
Fault-free, virtual address accesses that hit in the TLB incur no pipeline delay. Accesses that
miss the TLB or hit the TLB but violate an access attribute generate an access error
exception. On an access error, software can reference address and information registers in
the MMU to retrieve data. Depending on the fault source, software can obtain and load a
new TLB entry, modify the attributes of an existing entry, or abort the faulting process.
Freescale Semiconductor, Inc...
10.5.3 MMU Organization
Access to the MMU memory-mapped region is controlled by MMUBAR, a 32-bit
supervisor control register at 0x008 that is accessed using MOVEC or the serial BDM
debug port. The PRM describes the MOVEC instruction.
10.5.3.1 MMU Base Address Register (MMUBAR)
Figure 10-3 shows MMUBAR. The default reset state is an invalid MMUBAR, so that the
MMU is disabled and the memory-mapped space is not visible.
31
16 15
Field
BA
Reset
1
—
V
—
R/W
0
0
R/W
Rc
0x008
Figure 10-3. MMU Base Address Register
Table 10-3 describes MMU base address register fields.
Table 10-3. MMU Base Address Register Field Descriptions
Bits
Name
Description
31–16
BA
Base address. Defines the base address for the 64-Kbyte address space mapped to the MMU.
15–1
—
Reserved, should be cleared. Writes are ignored and reads return zeros.
0
V
Valid. Indicates when MMUMBAR contents are valid. BA is not used unless V is set.
0 MMUBAR contents are not valid.
1 MMUBAR contents are valid.
10.5.3.2 MMU Memory Map
MMUBAR holds the base address for the 64-Kbyte MMU memory map, shown in
Table 10-4. The MMU memory map area is not visible unless the MMUBAR is valid and
must be referenced aligned. A large portion of the map is reserved for future use.
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-11
Freescale Semiconductor, Inc.
MMU Definition
Table 10-4. MMU Memory Map
Offset from MMUBAR
+ 0x0000
MMU control register (MMUCR)
+ 0x0004
MMU operation register (MMUOR)
+ 0x0008
MMU status register (MMUSR)
+ 0x000C
Reserved
+ 0x0010
MMU fault, test, or TLB address register (MMUAR)
+ 0x0014
MMU read/write TLB tag register (MMUTR)
+ 0x0018
MMU read/write TLB data register (MMUDR)
Freescale Semiconductor, Inc...
+ 0x001C–0xFFFC
1May
Name
Reserved1
be used for implementation-specific information/control registers.
The address space ID (ASID) is located in a CPU space control register. The 8-bit ASID
value located in the low order byte of a 32-bit supervisor control register, mapped into CPU
space at address 0x003 and accessed using a MOVEC instruction. The ColdFire Family
Programmer’s Reference Manual describes MOVEC.
This 8-bit field is the current user ASID. The ASID is an extension to the virtual address.
Address space 0x00 may be reserved for supervisor mode. See address space mode
functionality in Section 10.5.3.3, “MMU Control Register (MMUCR).” The other 255
address spaces are used to tag user processes. The TLB entry ASID values are compared to
this value for user mode unless the TLB entry is marked shared (MMUTR[SG] is set). The
TLB entry ASID value may be compared to 0x00 for supervisor accesses.
10.5.3.3 MMU Control Register (MMUCR)
MMUCR, Figure 10-4, has the address space mode and virtual mode enable bits. The user
must force pipeline synchronization after writing to this register. Therefore, all writes to
this register must be immediately followed by a NOP instruction.
31
2
Field
—
Reset
R/W
Rc
0x000
Figure 10-4. MMU Control Register (MMUCR)
Table 10-5 describes MMUCR fields.
10-12
0
ASM EN
—
R/W
1
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
0
Freescale Semiconductor, Inc.
MMU Definition
Freescale Semiconductor, Inc...
Table 10-5. MMUCR Field Descriptions
Bits
Name
31–2
—
1
ASM
0
EN
Description
Reserved, should be cleared. Writes are ignored and reads return zeros.
Address space mode. Controls how the address space ID is used for TLB hits.
0 TLB entry ASID values are compared to the address space ID register value for user or supervisor
mode unless the TLB entry is marked shared (MMUTR[SG] = 1). The address space ID register
value is the effective address space for all requests, supervisor and user.
1 Address space 0x00 is reserved for supervisor mode and the effective address space is forced to
0x00 for all supervisor accesses. The other 255 address spaces are used to tag user processes.
The TLB entry ASID values are compared to the address space ID register for user mode unless
the TLB entry is marked shared (SG = 1). The TLB entry ASID value is always compared to 0x00 for
supervisor accesses. This allows two levels of sharing. All users but not the supervisor share an
entry if SG = 1and ASID ≠ 0. All users and the supervisor share an entry if SG = 1 and ASID = 0
Virtual mode enabled. Indicates when virtual mode is enabled.
0 Virtual mode is disabled.
1 Virtual mode is enabled.
10.5.3.4 MMU Operation Register (MMUOR)
Figure 10-5 shows the MMU operation register.
31
16 15
Field
AA
9
—
Reset
8
7
6
5
4
3
2
1
0
STLB CA CNL CAS ITLB ADR R/W ACC UAA
—
R/W
Read Only
Rc
R/W
0x0004
Figure 10-5. MMU Operation Register (MMUOR)
Table 10-6 describes MMUOR fields.
Table 10-6. MMUOR Field Descriptions
Bits
Name
Description
31–16
AA
TLB allocation address. This read-only field is maintained by MMU hardware. Its range and format
depend on the TLB implementation (specific TLB size in entries, associativity, and organization).
The access TLB function can use AA to read or write the addressed TLB entry. The MMU loads AA
on the following three events:
• On DTLB access errors, it loads the address of the TLB entry that caused the error.
• If UAA is set, it loads the address of the TLB entry chosen by the MMU for replacement.
• If STLB is set, it uses the data in MMUAR to search the TLB and if the TLB hits, loads the
address of the TLB entry that hits, or if the TLB misses, loads the TLB entry chosen by the
MMU for replacement.
The MMU never picks a locked entry for replacement, and TLB hits of locked entries do not update
hardware replacement algorithm information. This is so access error handlers mapped with locked
TLB entries do not influence the replacement algorithm. Further, TLB search operations do not
update the hardware replacement algorithm information while TLB writes (loads) do update the
hardware replacement algorithm information. The algorithm used to choose the allocation address
depends on the TLB implementation (such as LRU, round-robin, pseudo-random).
15–9
—
Reserved, should be cleared. Writes are ignored and reads return zeros.
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-13
Freescale Semiconductor, Inc.
MMU Definition
Freescale Semiconductor, Inc...
Table 10-6. MMUOR Field Descriptions (Continued)
Bits
Name
Description
8
STLB
Search TLB. STLB always reads as zero.
0 No operation
1 The MMU searches the TLB using data in MMUAR. This operation updates the probe TLB hit bit
in the status register plus loads the AA field as described above.
7
CA
6
CNL
Clear all non-locked TLB entries. Setting CNL clears all TLB entries that do not have their locked bit
set. CNL always reads as zero.
0 No operation
1 Clear all non-locked TLB entries.
5
CAS
Clear all non-locked TLB entries that match ASID. CAS is always reads as a zero.
0 No operation
1 Clear all non-locked TLB entries that match ASID register.
4
ITLB
ITLB operation. Used by TLB search and access operations that use the TLB allocation address.
0 The MMU uses the DTLB to search or update the allocation address.
1 The MMU uses the ITLB for searches and updates of the allocation address.
3
ADR
TLB address select. Indicates which address to use when accessing the TLB.
0 Use the TLB allocation address for the TLB address.
1 Use MMUAR for the TLB address.
2
R/W
TLB access read/write select. Indicates whether to do a read or a write when accessing the TLB.
0 Write
1 Read
1
ACC
MMU TLB access. This bit always reads as a zero. STLB is used for search operations.
0 No operation. ACC should be a zero to search the TLB.
1 The MMU reads or writes the TLB depending on R/W. For TLB reads, TLB tag and data results
are loaded into MMUTR and MMUDR. For TLB writes, the contents of these registers are written
to the TLB. The TLB is accessed using the TLB allocation address if ADR is zero or using
MMUAR if ADR is set.
0
UAA
Update allocation address. UAA always reads as a zero.
0 No operation
1 MMU updates the allocation address field with the MMU’s choice for the allocation address in the
ITLB or DTLB depending on the ITLB instruction operation bit.
Clear all TLB entries. CA always reads as zero.
0 No operation
1 Clear all TLB entries and all hardware TLB replacement algorithm information.
10.5.3.5 MMU Status Register (MMUSR)
MMUSR, Figure 10-6, is updated on all data access faults and search TLB operations.
31
Field
6
—
Reset
5
—
R/W
R/W
Rc
0x0008
Figure 10-6. MMU Status Register (MMUSR)
Table 10-7 describes MMUSR fields.
10-14
4
3
SPF RF WF
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
2
1
0
—
HIT
—
Freescale Semiconductor, Inc.
MMU Definition
Freescale Semiconductor, Inc...
Table 10-7. MMUSR Field Descriptions
Bits
Name
Description
31–6
—
5
SPF
Supervisor protect fault. Indicates if the last data fault was a user mode access that hit in a TLB entry
that had its supervisor protect bit set.
0 Last data access fault did not have a supervisor protect fault.
1 Last data access fault had a supervisor protect fault.
4
RF
Read access fault. Indicates if the last data fault was an data read access that hit in a TLB entry that
did not have its read bit set.
0 Last data access fault did not have a read protect fault.
1 Last data access fault had a read protect fault.
3
WF
Write access fault. Indicates if the last data fault was an data write access that hit in a TLB entry that
did not have its write bit set.
0 Last data access fault did not have a write protect fault.
1 Last data access fault had a write protect fault.
2
—
Reserved, should be cleared. Writes are ignored and reads return zeros.
1
HIT
0
—
Reserved, should be cleared. Writes are ignored and reads return zeros.
Search TLB hit. Indicates if the last data fault or the last search TLB operation hit in the TLB.
0 Last data access fault or search TLB operation did not hit in the TLB.
1 Last data access fault or search TLB operation hit in the TLB.
Reserved, should be cleared. Writes are ignored and reads return zeros.
10.5.3.6 MMU Fault, Test, or TLB Address Register (MMUAR)
The MMUAR format, Figure 10-7, depends on how the register is used.
31
0
Field
FA
Reset
—
R/W
R/W
Rc
0x0010
Figure 10-7. MMU Fault, Test, or TLB Register (MMUAR)
Table 10-8 describes MMUAR fields.
Table 10-8. MMUAR Field Descriptions
Bits
Name
Description
31–0
FA
Form address. Written by the MMU with the virtual address on DTLB misses and access faults. For
this case, all 32 bits are address bits. This register may be written with a virtual address and address
attribute information for searching the TLB (MMUCR[STLB]). For this case, FA[31–1] are the virtual
page number and FA[0] is the supervisor bit. The current ASID is used for the TLB search. MMUAR
can also be written with a TLB address for use with the access TLB function (using MMUCR[ACC]).
10.5.3.7 MMU Read/Write Tag and Data Entry Registers
(MMUTR and MMUDR)
Each TLB entry consists of a 32-bit TLB tag entry and a 32-bit TLB data entry. TLB entries
are referenced through MMUTR and MMUDR. For read TLB accesses, the contents of the
TLB tag and data entries referenced by the allocation address or MMUAR are loaded in
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-15
Freescale Semiconductor, Inc.
MMU Definition
MMUTR and MMUDR. TLB write accesses place MMUTR and MMUDR contents into
the TLB tag and data entries defined by the allocation address or MMUAR.
MMUTR, Figure 10-8, contains the virtual address tag, the address space ID (ASID), a
shared page indicator, and the valid bit.
31
10
Field
9
2
VA
Reset
ID
1
0
SG V
—
R/W
R/W
Rc
0x0014
Freescale Semiconductor, Inc...
Figure 10-8. MMU Read/Write TLB Tag Register (MMUTR)
Table 10-9 describes MMUTR fields.
Table 10-9. MMUTR Field Descriptions
Bits
Name
Description
31–10
VA
Virtual address. Defines the virtual address mapped by this entry. The number of bits used in the TLB
hit determination depends on the page size field in the corresponding TLB data entry.
9–2
ID
Address space ID (ASID). This extension to the virtual address marks this entry as part of 1 of 256
possible address spaces. Address space 0x00 can be reserved for supervisor mode. The other 255
address spaces are used to tag user processes. TLB entry ASID values are compared to the ASID
register value for user mode unless the TLB entry is marked shared (SG = 1). The TLB entry ASID
value may be compared to 0x00 for supervisor accesses or to the ASID. The description of
MMUCR[ASM] in Table 10-5 gives details on supervisor mode and ASID compares.
1
SG
Shared global. Indicates when the entry is shared among user address spaces. If an entry is shared,
its ASID is not part of the TLB hit determination for user accesses.
0 This entry is not shared globally.
1 This entry is shared globally.
Note that the ASID can be used to determine supervisor mode hits to allow two sharing levels. If SG
and MMUCR[ASM] are set and the ASID is not zero, all users (but not the supervisor) share an entry.
If SG and MMUCR[ASM] are set and the ASID is zero, all users and the supervisor share an entry.
The description of ASM in Table 10-5 details supervisor mode and ASID compares.
0
V
Valid. Indicates when the entry is valid. Only valid entries generate a TLB hit.
0 Entry is not valid.
1 Entry is valid.
MMUDR, Figure 10-9, contains the physical address, cache mode field, page size,
supervisor-protect bit, read, write, execute permission bits, and lock-entry bit.
31
Field
10
PA
Reset
9
8
SZ
7
—
R/W
R/W
Rc
0x0018
Figure 10-9. MMU Read/Write TLB Data Register
Table 10-9 describes MMUDR fields.
10-16
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
6
CM
5
4
3
2
1
0
SP R W X LK —
Freescale Semiconductor, Inc.
MMU Definition
Table 10-10. MMUDR Field Descriptions
Bits
Name
Descriptions
31–10
PA
Physical address. Defines the physical address which is mapped by this entry. The number of bits used
to build the effective physical address if this TLB entry hits depends on the page size field.
9–8
SZ
Page size. Page size for this entry.
Freescale Semiconductor, Inc...
SZ Page Size
Function
00
1 Mbyte
VA[31–20] used for TLB hit
01
4 Kbyte
VA[31–12] used for TLB hit
10
8 Kbyte
VA[31–13] used for TLB hit
11
1 Kbyte
VA[31–10] used for TLB hit
7–6
CM
Cache mode. If a Harvard TLB implementation is used, CM0 is a don’t care for the ITLB. CM is ignored
on writes and always reads as zero for the ITLB.
Instruction cache modes:
1x Page is non-cacheable.
0x Page is cacheable.
Data cache modes
00 Page is cacheable writethrough.
01 Page is cacheable copyback.
10 Page is non-cacheable precise.
11 Page is non-cacheable imprecise.
5
SP
Supervisor protect. Controls user mode access to the page mapped by this entry.
0 Entry is not supervisor protected.
1 Entry is supervisor protected. An attempted user mode access that matches this entry generates an
access error exception.
4
R
Read access enable. Indicates if data read accesses to this entry are allowed. If a Harvard TLB
implementation is used, this bit is a don’t care for the ITLB. This bit is ignored on writes and always
reads as zero for the ITLB.
0 Do not allow data read accesses. Attempted data read accesses that match this entry generate an
access error exception.
1 Allow data read accesses.
3
W
Write access enable. Indicates if data write accesses are allowed to this entry. If separate ITLB and
DTLBs) are used, W is a don’t care for the ITLB. W is ignored on writes and reads as zero for the ITLB.
0 Do not allow data write accesses. Attempted data write accesses that match this entry generate an
access error exception.
1 Allow data write accesses.
2
X
Execute access enable. Indicates if instruction fetches to this entry are allowed. If separate ITLB and
DTLBs are is used, X is a don’t care for the DTLB. X is ignored on writes and reads as zero for the
DTLB.
0 Do not allow instruction fetches. Attempted instruction fetches that match this entry cause an access
error exception.
1 Allow instruction fetch accesses.
1
LK
Lock entry bit. Indicates if this entry is included in the replacement algorithm. TLB hits of locked entries
do not update replacement algorithm information.
0 Include this entry when determining the best entry for a TLB allocation.
1 Do not allow this entry to be selected by the replacement algorithm.
0
—
Reserved, should be cleared. Writes are ignored and reads return zeros.
10.5.4 MMU TLB
Each TLB entry consists of two 32-bit fields. The first is the TLB tag entry; the second is
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-17
Freescale Semiconductor, Inc.
MMU Definition
the TLB data entry. TLB size and organization are implementation dependent. TLB entries
can be read and written through MMU registers. TLB contents are unaffected by reset.
10.5.5 MMU Operation
Freescale Semiconductor, Inc...
The processor sends instruction fetch requests and data read/write requests to the K-Bus in
the instruction and operand address generation cycles (IAG and OAG). The K-Bus
controller and memories occupy the next two pipeline stages, instruction fetch cycles 1 and
2 (IC1 and IC2) and operand fetch cycles 1 and 2 (OC1 and OC2). For late writes, optional
data pipeline stages are added to the K-Bus controller as well as any writable memories.
Table 10-11 shows the association between K-Bus memory pipeline stages and the
processor’s pipeline structures, shown in Figure 10-1.
.
Table 10-11. Version 4 K-Bus Memory Pipelines
K-Bus Memory Pipeline Stage
Instruction Fetch Pipeline Operand Execution Pipeline
J stage
IAG
OAG
KC1 stage
IC1
OC1
KC2 stage
IC2
OC2
Operand execute stage
n/a
EX
Late-write stage
n/a
DA
Version 4 K-Buses use the same 2-cycle read pipeline developed for Version 3. Each K-Bus
has 32-bit address and 32-bit read data paths. Version 4 uses synchronous memory elements
for all memory control units. To support this, certain control information and all address
bits are sent on the K-Buses at the end of the cycle before the initial bus access cycle (J
cycle). The data K-Bus has an additional 32-bit write data path. For processor store
operations, Version 4 ColdFire uses a late-write strategy, which can require 2 additional
data K-Bus cycles. This yields the K-Bus pipeline behavior described in Table 10-12.
Table 10-12. K-Bus Pipeline Cycles
Cycle
J
Description
Control and partial address broadcast (to start synchronous memories)
KC1
Complete address and control broadcast plus MMU information. It is during this cycle that all memory element
read operations are performed; that is, memory arrays are accessed.
KC2
Select appropriate memory as source, return data to processor, handle cache misses or hold K-Bus pipeline
as needed.
EX
Optional write stage, pipeline address and control for store operations.
DA
Data available for stores from processor; memory element update occurs in the next cycle.
The K2M module contains two independent memory unit access controllers and two
independent K-Bus controllers (I-Kcl and O-Kcl). Each instruction and data K-Bus request
is analyzed to see which, if any, K-Bus memory controller is referenced. This information,
along with cache mode, store precision, and fault information, is sourced during KC1.
10-18
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
MMU Implementation
The optional MMU is referenced concurrently with the memory unit access controllers. It
has two independent control sections to simultaneously process an instruction and data
K-Bus request. Figure 10-1 shows how the MMU and memory unit access controllers fit in
the K-Bus pipeline. As the diagram shows, core address and attributes are used to access
the mapping registers and the MMU. By the middle of the KC1 cycle, the K-Bus physical
memory address (KADDR_KC1) is available along with its corresponding access control.
Freescale Semiconductor, Inc...
Figure 10-10 shows more details of the MMU structure. The TLB is accessed at the
beginning of the KC1 pipeline stage so the resulting physical address can be sourced to the
cache controllers to factor into the cache hit/miss determination. This is required because
caches are virtually indexed but physically mapped.
JADDR, J Control
To K-Bus memory controllers
J
TLB data
entries
TLB tag
entries
Memory unit access control
(MMUBAR, RAMBARs, ROMBARs,
ACRs, CACR priority hit logic)
Comp
To K-Bus control for
TLB miss logic
TLB hit
entry
data
KC1
Translated address
MMU’s access control
TLB Hit
Untranslated address
mapping register’s
access control
To K-Bus control for
TLB miss logic
Mapping register hit
or special mode access
To K-Bus memory controllers
plus K-to-M bus interface
KADDR_KC1
KC1 cycle access control
Figure 10-10. K-Bus Address and Attributes Generation
10.6 MMU Implementation
The MMU implements a 64-entry full-associative Harvard TLB architecture with 32-entry
ITLB and DTLB. This section provides more details of this specific TLB implementation.
This section details the operation and looks at the size, frequency, miss rate, and miss
recovery time of this specific TLB implementation.
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-19
Freescale Semiconductor, Inc.
MMU Implementation
10.6.1 TLB Address Fields
Because the TLB has a total of 64 entries (32 each for the ITLB and DTLB), a 6-bit address
field is necessary. TLB addresses 0–31 reference the ITLB and TLB addresses 32–63
reference the DTLB.
In the MMUOR, bits 0 through 5 of the TLB allocation address (AA[5–0]), have this
address format for CF4e. The remaining TLB allocation address bits (AA[15–6]) are
ignored on updates and always read as zero.
Freescale Semiconductor, Inc...
When MMUAR is used for a TLB address, bits FA[5–0] also have this address format for
CF4e. The remaining form address bits (FA[31–6]) are don’t cares when this register is
being used for a TLB address.
10.6.2 TLB Replacement Algorithm
The instruction and data TLBs provide low-latency access to recently used instruction and
operand translation information. CF4e ITLBs and DTLBs are 32-entry fully associative
caches. The 32 ITLB entries are searched on each instruction K-Bus reference; the 32
DTLB entries are searched on each operand K-Bus reference.
CF4e TLBs are software controlled. The TLB clear-all function clears valid bits on every
TLB entry and resets the replacement logic. A new valid entry is loaded in the TLBs may
be designated as locked and unavailable for allocation. TLB hits to locked entries do not
update replacement algorithm information.
When a new TLB entry needs to be allocated, the user can specify the exact TLB entry to
be updated (through MMUOR[ADR] and MMUAR) or let TLB hardware pick the entry to
update based on the replacement algorithm. A pseudo-least-recently used (PLRU)
algorithm picks the entry to be replaced on a TLB miss. The algorithm works as follows:
•
•
If any element is empty (non-valid), use the lowest empty element as the allocate
entry (that is, entry 0 before 1, 2, 3, and so on).
If all entries are valid, use the entry indicated by the PLRU as the allocate entry.
The PLRU algorithm uses 31 most-recently used state bits per TLB to track the TLB hit
history. Table 10-13 lists these state bits.
Table 10-13. PLRU State Bits
State Bits
10-20
Meaning
rdRecent31To16
A one indicates 31To16 is more recent than 15To00
rdRecent31To24
A one indicates 31To24 is more recent than 23To16
rdRecent15To08
A one indicates 15To08 is more recent than 07To00
rdRecent31To28
A one indicates 31To28 is more recent than 27To24
rdRecent23To20
A one indicates 23To20 is more recent than 19To16
rdRecent15To12
A one indicates 15To12 is more recent than 11To08
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
MMU Implementation
Table 10-13. PLRU State Bits (Continued)
Freescale Semiconductor, Inc...
State Bits
Meaning
rdRecent07To04
A one indicates 07To04 is more recent than 03To00
rdRecent31To30
A one indicates 31To30 is more recent than 29To28
rdRecent27To26
A one indicates 27To26 is more recent than 25To24
rdRecent23To22
A one indicates 23To22 is more recent than 21To20
rdRecent19To18
A one indicates 19To18 is more recent than 17To16
rdRecent15To14
A one indicates 15To14 is more recent than 13To12
rdRecent11To10
A one indicates 11To10 is more recent than 09To08
rdRecent07To06
A one indicates 07To06 is more recent than 05To04
rdRecent03To02
A one indicates 03To02 is more recent than 01To00
rdRecent31
A one indicates 31 is more recent than 30
rdRecent29
A one indicates 29 is more recent than 28
rdRecent27
A one indicates 27 is more recent than 26
rdRecent25
A one indicates 25 is more recent than 24
rdRecent23
A one indicates 23 is more recent than 22
rdRecent21
A one indicates 21 is more recent than 20
rdRecent19
A one indicates 19 is more recent than 18
rdRecent17
A one indicates 17 is more recent than 16
rdRecent15
A one indicates 15 is more recent than 14
rdRecent13
A one indicates 13 is more recent than 12
rdRecent11
A one indicates 11 is more recent than 10
rdRecent09
A one indicates 09 is more recent than 08
rdRecent07
A one indicates 07 is more recent than 06
rdRecent05
A one indicates 05 is more recent than 04
rdRecent03
A one indicates 03 is more recent than 02
rdRecent01
A one indicates 01 is more recent than 00
Binary state bits are updated on all TLB write (load) operations as well as normal ITLB and
DTLB hits of non-locked entries. Also, if all entries in a binary state are locked, than that
state is always set. That is, if entries 15, 14, 13, and 12 were locked, LRU state bit
rdRecent15To14 is forced to one.
For a completely valid TLB, binary state information determines the LRU entry. The CF4e
replacement algorithm is deterministic and, for the case of a full TLB, with no locked
entries, and always touching new pages, the replacement entry repeats every 32 TLB loads.
10.6.3 TLB Locked Entries
Figure 10-11 is a ColdFire MMU Harvard TLB block diagram.
Chapter 10. Memory Management Unit (MMU)
For More Information On This Product,
Go to: www.freescale.com
10-21
Freescale Semiconductor, Inc.
MMU Instructions
For TLB miss faults, the instruction restart model completely reexecutes an instruction
on returning from the exception handler. An instruction can touch two instruction pages (a
32- or 48-bit instruction can straddle two pages) or four data pages (a memory-to-memory
word or longword move where misaligned source and destination operands straddle two
pages). Therefore, one instruction may take two ITLB misses and allocate two ITLB pages
before completion. Likewise, one instruction may require four DTLB misses and allocate
four DTLB pages. Because of this, a pool of unlocked TLB entries must be available if
virtual memory is used.
Freescale Semiconductor, Inc...
The above examples show the fewest entries needed to guarantee an instruction can
complete execution. For good MMU performance, more unlocked TLB entries should
be available.
Current address space ID (ASID)
J
Instruction or data K-Bus address and attributes
TLB Tag
Entry 31
TLB Tag
Entry 0
TLB Tag
Entry 31
TLB Tag
Entry 0
KC1
Compare
Compare
Instruction or dataK-Bus hit
select
To K-Bus control for instruction or
DTLB miss logic
IC1 or OC1 translated address
IC1 or OC1 access control
Figure 10-11. Version 4 ColdFire MMU Harvard TLB
10.7 MMU Instructions
The MOVE to USP and MOVE from USP instructions have been added for accessing the
USP. Refer to the PRM for more information.
10-22
ColdFire CF4E Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 11
Debug Support
This chapter describes the Revision D enhanced hardware debug support in the ColdFire
Version 4. This revision of the ColdFire debug architecture encompasses earlier revisions.
An expanded set of debug functionality is defined as Revision B (or Rev. B). The further
enhanced debug architecture implemented in the Version 4 ColdFire is known as
Revision C (or Rev. C).
11.1 Overview
The debug module interface is shown in Figure 11-1.
High-speed
local bus
ColdFire CPU Core
Debug Module
Control
BKPT
Trace Port
PSTDDATA[7:0]
PSTCLK
Communication Port
DSCLK, DSI, DSO
Figure 11-1. Processor/Debug Module Interface
Debug support is divided into three areas:
•
•
Real-time trace support: The ability to determine the dynamic execution path
through an application is fundamental for debugging. The ColdFire solution
implements an 8-bit parallel output bus that reports processor execution status and
data to an external BDM emulator system. See Section 11.3, “Real-Time Trace
Support.”
Background debug mode (BDM): Provides low-level debugging in the ColdFire
processor complex. In BDM, the processor complex is halted and a variety of
commands can be sent to the processor to access memory and registers. The
external BDM emulator uses a three-pin, serial, full-duplex channel. See
Section 11.5, “Background Debug Mode (BDM),” and Section 11.4,
“Programming Model.”
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-1
Freescale Semiconductor, Inc.
Overview
Freescale Semiconductor, Inc...
•
Real-time debug support: BDM requires the processor to be halted, which many
real-time embedded applications cannot do. Debug interrupts let real-time systems
execute a unique service routine that can quickly save key register and variable
contents and return the system to normal operation without halting. External
development systems can access saved data because the hardware supports
concurrent operation of the processor and BDM-initiated commands. In addition,
the option is provided to allow interrupts to occur. See Section 11.6, “Real-Time
Debug Support.”
The Version 2 ColdFire core implemented the original debug architecture, now called
Revision A. Based on feedback from customers and third-party developers, enhancements
have been added to succeeding generations of ColdFire cores. For Revision A, CSR[HRL]
is 0. See Section 11.4.5, “Configuration/Status Register (CSR).”
The Version 3 core implements Revision B of the debug architecture, offering more
flexibility for configuring the hardware breakpoint trigger registers and removing the
restrictions involving concurrent BDM processing while hardware breakpoint registers are
active. For Revision B, CSR[HRL] is 1.
Revision C of the debug architecture more than doubles the on-chip breakpoint registers
and provides an ability to interrupt debug service routines. For Revision C, CSR[HRL] is
2.
Differences between Revision B and C are summarized as follows:
•
•
Debug Revision B has separate PST[3:0] and DDATA[3:0] signals.
Debug Revision C adds breakpoint registers and supports normal interrupt request
service during debug. It combines debug signals into PSTDDATA[7:0]
The addition of the memory management unit (MMU) to the baseline architecture requires
corresponding enhancements to the ColdFire debug functionality, resulting in Revision D.
For Revision D, the revision level bit, CSR[HRL], is 3.
With software support, the MMU can provide a demand-paged, virtual address
environment. To support debugging in this virtual environment, the debug enhancements
are primarily related to the expansion of the virtual address to include the 8-bit address
space identifier (ASID). Conceptually, the virtual address is expanded to a 40-bit value:
the 8-bit ASID plus the 32-bit address.
The expansion of the virtual address affects two major debug functions:
•
11-2
The ASID is optionally included in the specification of the hardware breakpoint
registers. As an example, the four PC breakpoint registers are each expanded by 8
bits, so that a specific ASID value may be programmed as part of the breakpoint
instruction address. Likewise, each operand address/data breakpoint register is
expanded to include an ASID value. Finally, new control registers define if and how
the ASID is to be included in the breakpoint comparison trigger logic.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Signal Descriptions
Freescale Semiconductor, Inc...
•
The debug module implements the concept of ownership trace in which the ASID
value may be optionally displayed as part of the real-time trace functionality. When
enabled, real-time trace displays instruction addresses on every change-of-flow
instruction that is not absolute or PC-relative. For Rev. D, this instruction address
display optionally includes the contents of the ASID, thus providing the complete
instruction virtual address on these instructions.
Additionally when a Sync_PC serial BDM command is loaded from the external
development system, the processor optionally displays the complete virtual
instruction address, including the 8-bit ASID value.
In addition to these ASID-related changes, the new MMU control registers are accessible
by using serial BDM commands. The same BDM access capabilities are also provided for
the EMAC and FPU programming models.
Finally, a new serial BDM command is implemented to assist debugging when a software
error generates an incorrect memory address that hangs the external bus. The new BDM
command attempts to break this condition by forcing a bus termination.
11.2 Signal Descriptions
Table 11-1 describes debug module signals. All ColdFire debug signals are unidirectional
and related to a rising edge of the processor core’s clock signal. The standard 26-pin debug
connector is shown in Section 11.9, “Motorola-Recommended BDM Pinout.”
Table 11-1. Debug Module Signals
Signal
Description
Development Serial
Clock (DSCLK)
Internally synchronized input. (The logic level on DSCLK is validated if it has the same value on
two consecutive rising bus clock edges.) Clocks the serial communication port to the debug
module during packet transfers. Maximum frequency is PSTCLK/5. At the synchronized rising
edge of DSCLK, the data input on DSI is sampled and DSO changes state.
BDM Force Transfer
Acknowledge
(BDMFORCEACKB)
Helps break a hung external bus condition. An incorrect reference to a memory address that
effectively hangs the external bus because no slave device responds. For such situations, the
new serial BDM command can be sent into the debug module of the CF4e core. After decoding
this command, the CF4e core asserts BDMFORCEACKB for an entire M-Bus clock period. This
output can be factored into the external or M-Bus termination logic to unconditionally force a
transfer acknowledge so that debug can continue without requiring a system reset.
See Section 11.5.3.3.10, “Force Transfer Acknowledge (force_ta).”
Development Serial
Input (DSI)
Internally synchronized input that provides data input for the serial communication port to the
debug module, once the DSCLK has been seen as high (logic 1).
Development Serial
Output (DSO)
Provides serial output communication for debug module responses. DSO is registered
internally. The output is delayed from the validation of DSCLK high.
Breakpoint (BKPT)
Input used to request a manual breakpoint. Assertion of BKPT puts the processor into a halted
state after the current instruction completes. Halt status is reflected on processor status/debug
data signals (PSTDDATA[7:0]) as the value 0xF. If CSR[BKD] is set (disabling normal BKPT
functionality), asserting BKPT generates a debug interrupt exception in the processor.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-3
Freescale Semiconductor, Inc.
Signal Descriptions
Table 11-1. Debug Module Signals (Continued)
Freescale Semiconductor, Inc...
Signal
Description
Processor Status
Clock (PSTCLK)
Half-speed version of the processor clock. Its rising edge appears in the center of the
two-processor-cycle window of valid PSTDDATA output. See Figure 11-2. PSTCLK indicates
when the development system should sample PSTDDATA values.
If real-time trace is not used, setting CSR[PCD] keeps PSTCLK and PSTDDATA outputs from
toggling without disabling triggers. Non-quiescent operation can be reenabled by clearing
CSR[PCD], although the external development systems must resynchronize with the
PSTDDATA output.
PSTCLK starts clocking only when the first non-zero PST value (0xC, 0xD, or 0xF) occurs
during system reset exception processing. Table 11-4 describes PST values.
Processor
Status/Debug Data
(PSTDDATA[7:0])
These outputs, which change on the negative edge of PSTCLK, indicate both processor status
and captured address and data values and are discussed more thoroughly in Section 11.2.1,
“Processor Status/Debug Data (PSTDDATA[7:0]).”
Figure 11-2 shows PSTCLK timing with respect to PSTDDATA.
PSTCLK
PSTDDATA
Figure 11-2. PSTCLK Timing
11.2.1 Processor Status/Debug Data (PSTDDATA[7:0])
Processor status data outputs are used to indicate both processor status and captured
address and data values. They operate at half the processor’s frequency. Given that
real-time trace information appears as a sequence of 4-bit data values, there are no
alignment restrictions; that is, the processor status (PST) values and operands may appear
on either nibble of PSTDDATA[7:0]. The upper nibble (PSTDDATA[7:4]) is the more
significant and yields values first.
CSR controls capturing of data values to be presented on PSTDDATA. Executing the
WDDATA instruction captures data that is displayed on PSTDDATA too. These signals
are updated each processor cycle and display two values at a time for two processor clock
cycles. Table 11-2 shows the PSTDDATA output for the processor’s sequential execution
of single-cycle instructions (A, B, C, D...). Cycle counts are shown relative to processor
frequency. These outputs indicate the current processor pipeline status and are not related
to the current bus transfer.
Table 11-2. PSTDDATA: Sequential Execution of Single-Cycle Instructions
Cycles
11-4
PSTDDATA[7:0]
T+0, T+1
{PST for A, PST for B}
T+2, T+3
{PST for C, PST for D}
T+4, T+5
{PST for E, PST for F}
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Real-Time Trace Support
The signal timing for the example in Table 11-2 is shown in Figure 11-3.
T+0
T+1
T+2
T+3
T+4
T+5
T+6
Processor Clock
PSTCLK
PSTDDATA
{A, B}
{C, D}
{E, F}
Freescale Semiconductor, Inc...
Figure 11-3. PSTDDATA: Single-Cycle Instruction Timing
Table 11-3 shows the case where a PSTDDATA module captures a memory operand on a
simple load instruction: mov.l <mem>,Rx.
Table 11-3. PSTDDATA: Data Operand Captured
Cycle
PSTDDATA[7:0]
T
{PST for mov.l, PST marker for captured operand) = {0x1, 0xB}
T+1
{0x1, 0xB}
T+2
{Operand[3:0], Operand[7:4]}
T+3
{Operand[3:0], Operand[7:4]}
T+4
{Operand[11:8], Operand[15:12]}
T+5
{Operand[11:8], Operand[15:12]}
T+6
{Operand[19:16], Operand[23:20]}
T+7
{Operand[19:16], Operand[23:20]}
T+8
{Operand[27:24], Operand[31:28]}
T+9
{Operand[27:24], Operand[31:28]}
T+10
(PST for next instruction)
T+11
(PST for next instruction,...)
NOTE:
A PST marker and its data display are sent contiguously.
Except for this transmission, the IDLE status (0x0) can appear
anytime. Again, given that real-time trace information appears
as a sequence of 4-bit values, there are no alignment
restrictions. That is, PST values and operands may appear on
either nibble of PSTDDATA.
11.3 Real-Time Trace Support
Real-time trace, which defines the dynamic execution path, is a fundamental debug
function. The ColdFire solution is to include a parallel output port providing encoded
processor status and data to an external development system. This 8-bit port is partitioned
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-5
Freescale Semiconductor, Inc.
Real-Time Trace Support
Freescale Semiconductor, Inc...
into two consecutive 4-bit nibbles. Each nibble can either transmit information concerning
the processor’s execution status (PST) or debug data (DDATA). The processor status may
not be related to the current bus transfer, due to the decoupling FIFOs.
External development systems can use PSTDDATA outputs with an external image of the
program to completely track the dynamic execution path. This tracking is complicated by
any change in flow, especially when branch target address calculation is based on the
contents of a program-visible register (variant addressing). PSTDDATA outputs can be
configured to display the target address of such instructions in sequential nibble
increments across multiple processor clock cycles, as described in Section 11.3.1, “Begin
Execution of Taken Branch (PST = 0x5).” Four 32-bit storage elements form a FIFO
buffer connecting the processor’s high-speed local bus to the external development system
through PSTDDATA[7:0]. The buffer captures branch target addresses and certain data
values for eventual display on the PSTDDATA port, two nibbles at a time starting with the
least significant bit (lsb).
Execution speed is affected only when three storage elements contain valid data to be
dumped to the PSTDDATA port. This occurs only when two values are captured
simultaneously in a read-modify-write operation. The core stalls until two FIFO entries
are available.
11-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Real-Time Trace Support
Table 11-4 shows the encoding of these signals.
Table 11-4. Processor Status Encoding
PST[3:0]
Freescale Semiconductor, Inc...
Definition
Hex
Binary
0x0
0000
Continue execution. Many instructions execute in one processor cycle. If an instruction requires more
clock cycles, subsequent clock cycles are indicated by driving PSTDDATA outputs with this encoding.
0x1
0001
Begin execution of one instruction. For most instructions, this encoding signals the first clock cycle of
an instruction’s execution. Certain change-of-flow opcodes, plus the PULSE and WDDATA instructions,
generate different encodings.
0x2
0010
Begin execution of two instructions. For superscalar instruction dispatches, this encoding signals the
first clock cycle of the simultaneous instructions’ execution.
0x3
0011
Entry into user-mode. Signaled after execution of the instruction that caused the ColdFire processor to
enter user mode. If the display of the ASID is enabled (CSR[3] = 1), the following occurs:
• The 8-bit ASID follows the instruction address; that is, the PSTDDATA sequence is {0x3, 0x5,
marker, instruction address, 0x8, ASID}, where 0x8 is the ASID data marker.
• Whenever the current ASID is loaded by the privileged MOVEC instruction, the ASID is displayed
on PSTDDATA. The resulting PSTDDATA sequence for the MOVEC instruction is then {0x1, 0x8,
ASID}, where the 0x8 is the data marker for the ASID.
0x4
0100
Begin execution of PULSE and WDDATA instructions. PULSE defines logic analyzer triggers for debug
or performance analysis. WDDATA lets the core write any operand (byte, word, or longword) directly to
the PSTDDATA port, independent of debug module configuration. When WDDATA is executed, a value
of 0x4 is signaled, followed by the appropriate marker, and then the data transfer on the PSTDDATA
port. Transfer length depends on the WDDATA operand size.
0x5
0101
Begin execution of taken branch or SYNC_PC command. For some opcodes, a branch target address
may be displayed on PSTDDATA depending on the CSR settings. CSR also controls the number of
address bytes displayed, indicated by the PST marker value preceding the DDATA nibble that begins
the data output. See Section 11.3.1, “Begin Execution of Taken Branch (PST = 0x5).” Also indicates
that the SYNC_PC command has been issued.
0x6
0110
Begin execution of instruction plus a taken branch. The processor completes execution of a taken
conditional branch instruction and simultaneously starts executing the target instruction. This is
achieved through branch folding.
0x7
0111
Begin execution of return from exception (RTE) instruction.
0x8–
0xB
1000–
1011
Indicates the number of bytes to be displayed on the DDATA port on subsequent clock cycles. The
value is driven onto the PSTDDATA port one cycle before the data is displayed.
0x8 Begin 1-byte transfer on PSTDDATA.
0x9 Begin 2-byte transfer on PSTDDATA.
0xA Begin 3-byte transfer on PSTDDATA.
0xB Begin 4-byte transfer on PSTDDATA.
0xC
1100
Normal exception processing. Exceptions that enter emulation mode (debug interrupt or optionally
trace) generate a different encoding, as described below. Because the 0xC encoding defines a
multiple-cycle mode, PSTDDATA outputs are driven with 0xC until exception processing completes.
0xD
1101
Emulator mode exception processing. Displayed during emulation mode (debug interrupt or optionally
trace). Because this encoding defines a multiple-cycle mode, PSTDDATA outputs are driven with 0xD
until exception processing completes.
0xE
1110
A breakpoint state change causes this encoding to assert for one cycle only followed by the trigger
status value. If the processor stops waiting for an interrupt, the encoding is asserted for multiple cycles.
See Section 11.3.2, “Processor Stopped or Breakpoint State Change (PST = 0xE).”
0xF
1111
Processor is halted. Because this encoding defines a multiple-cycle mode, the PSTDDATA outputs
display 0xF until the processor is restarted or reset. (see Section 11.5.1, “CPU Halt”)
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-7
Freescale Semiconductor, Inc.
Real-Time Trace Support
11.3.1 Begin Execution of Taken Branch (PST = 0x5)
PST is 0x5 when a taken branch is executed. For some opcodes, a branch target address
may be displayed on PSTDDATA depending on the CSR settings. CSR also controls the
number of address bytes displayed, which is indicated by the PST marker value
immediately preceding the PSTDDATA nibble that begins the data output.
Freescale Semiconductor, Inc...
Multiple byte DDATA values are displayed in least-to-most-significant order. The
processor captures only those target addresses associated with taken branches which use a
variant addressing mode, that is, RTE and RTS instructions, JMP and JSR instructions
using address register indirect or indexed addressing modes, and all exception vectors.
The simplest example of a branch instruction using a variant address is the compiled code
for a C language case statement. Typically, the evaluation of this statement uses the
variable of an expression as an index into a table of offsets, where each offset points to a
unique case within the structure. For such change-of-flow operations, the V4
microarchitecture uses the debug pins to output the following sequence of information on
two successive processor clock cycles:
1. Use PSTDDATA (0x5) to identify that a taken branch is executed.
2. Optionally signal the target address to be displayed sequentially on the PSTDDATA
pins. Encodings 0x9–0xB identify the number of bytes displayed.
3. The new target address is optionally available on subsequent cycles using the
PSTDDATA port. The number of bytes of the target address displayed on this port
is configurable (2, 3, or 4 bytes, where the encoding is 0x9, 0xA, and OxB,
respectively).
Another example of a variant branch instruction would be a JMP (A0) instruction.
Figure 11-4 shows when the PSTDDATA outputs that indicate when a JMP (A0) executed,
assuming the CSR was programmed to display the lower 2 bytes of an address.
Processor Clock
PSTCLK
PSTDDATA
0x59
A0[3–0,7–4]
A0[11–8,15–12]
Figure 11-4. Example JMP Instruction Output on PSTDDATA
PSTDDATA is driven two nibbles at a time with a 0x59; 0x5 indicates a taken branch and
the marker value 0x9 indicates a 2-byte address. Thus, the subsequent 4 nibbles display
the lower 2 bytes of address register A0 in least-to-most-significant nibble order. The
PSTDDATA output after the JMP instruction continues with the next instruction.
11-8
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Real-Time Trace Support
11.3.2 Processor Stopped or Breakpoint State Change
(PST = 0xE)
The 0xE encoding is generated either as a one- or multiple-cycle issue as follows:
•
Freescale Semiconductor, Inc...
•
When the core is stopped by a STOP instruction, this encoding appears in
multiple-cycle format. The ColdFire processor remains stopped until an interrupt
occurs; thus, PSTDDATA outputs display 0xE until stopped mode is exited.
When a breakpoint status change is to be output on PSTDDATA, 0xE is displayed
for one cycle, followed immediately with the 4-bit value of the current trigger
status, where the trigger status is left justified rather than in the CSR[BSTAT]
description. Section 11.4.5, “Configuration/Status Register (CSR),” shows that
status is right justified. That is, the displayed trigger status on PSTDDATA after a
single 0xE is as follows:
— 0x0 = no breakpoints enabled
— 0x2 = waiting for level-1 breakpoint
— 0x4 = level-1 breakpoint triggered
— 0xA = waiting for level-2 breakpoint
— 0xC = level-2 breakpoint triggered
Thus, 0xE can indicate multiple events, based on the next value, as Table 11-5 shows.
Table 11-5. 0xE Status Posting
PSTDDATA Stream Includes
Result
{0xE, 0x2}
Breakpoint state changed to waiting for level-1 trigger
{0xE, 0x4}
Breakpoint state changed to level-1 breakpoint triggered
{0xE, 0xA}
Breakpoint state changed to waiting for level-2 trigger
{0xE, 0xC}
Breakpoint state changed to level-2 breakpoint triggered
{0xE, 0xE}
Stopped mode.
11.3.3 Processor Halted (PST = 0xF)
PST is 0xF when the processor is halted (see Section 11.5.1, “CPU Halt”). Because this
encoding defines a multiple-cycle mode, the PSTDDATA outputs display 0xF until the
processor is restarted or reset. Therefore, PSTDDATA[7:0] continuously are 0xFF.
NOTE:
HALT can be distinguished from a data output 0xFF by
counting 0xFF occurrences on PSTDDATA. Because data
always follows a marker (0x8, 0x9, 0xA, or 0xB), the longest
occurrence in PSTDDATA of 0xFF in a data output is four.
Two scenarios exist for data 0xFFFF_FFFF:
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-9
Freescale Semiconductor, Inc.
Programming Model
•
Freescale Semiconductor, Inc...
•
The B marker occurs on the most-significant nibble of PSTDDATA with the data of
0xFF following:
PSTDDATA[7:0]
0xBF
0xFF
0xFF
0xFF
0xFX (X indicates that the next PST value is guaranteed to not be 0xF.)
The B marker occurs on the least-significant nibble of PSTDDATA with the data of
0xFF following:
PSTDDATA[7:0]
0xYB
0xFF
0xFF
0xFF
0xFF
0xXY (X indicates the PST value is guaranteed not to be 0xF, and Y signifies a
PSTDDATA value that doesn’t affect the 0xFF count.)
NOTE:
As the result of the above, a count of at least nine or more
sequential single 0xF values or five or more sequential 0xFF
values indicates the HALT condition.
11.4 Programming Model
In addition to the existing BDM commands that provide access to the processor’s registers
and the memory subsystem, the debug module contains 19 registers to support the
required functionality. These registers are also accessible from the processor’s supervisor
programming model by executing the WDEBUG instruction (write only). Thus, the
breakpoint hardware in the debug module can be read or written by the external
development system using the debug serial interface or written by the operating system
running on the processor core. Software is responsible for guaranteeing that accesses to
these resources are serialized and logically consistent. Hardware provides a locking
mechanism in the CSR to allow the external development system to disable any attempted
writes by the processor to the breakpoint registers (setting CSR[IPW]). BDM commands
must not be issued if the WDEBUG instruction is used to access debug module registers or
the resulting behavior is undefined.
These registers, shown in Figure 11-5, are treated as 32-bit quantities, regardless of the
number of implemented bits.
11-10
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model
31
15
31
15
31
15
31
7
0
AATR
Address attribute trigger register
ABLR
ABHR
Address low breakpoint register
Address high breakpoint register
AATR1
Address 1 attribute trigger register
0
7
15
0
0
Freescale Semiconductor, Inc...
ABLR1 Address low breakpoint 1 register
ABHR1 Address high breakpoint 1 register
31
15
31
15
31
7
BAAR
BDM address attributes register
CSR
Configuration/status register
DBR
DBMR
Data breakpoint register
Data breakpoint mask register
0
15
31
0
0
15
0
Data breakpoint 1 register
DBR1
DBMR1 Data breakpoint mask 1 register
31
15
31
0
15
31
PBR
PBR1
PBR2
PBR3
PBMR
PC breakpoint register
PC breakpoint 1 register
PC breakpoint 2 register
PC breakpoint 3 register
PC breakpoint mask register
TDR
Trigger definition register
XTDR
Extended trigger definition register
0
15
0
Note: Each debug register is accessed as a 32-bit register; shaded fields above are not used (don’t care).
All debug control registers are writable from the external development system or the CPU via the
WDEBUG instruction.
CSR is write-only from the programming model. It can be read from and written to through the BDM port.
CSR is accessible in supervisor mode as debug control register 0x00 using the WDEBUG instruction and
through the BDM port using the RDMREG and WDMREG commands.
Figure 11-5. Debug Programming Model
The registers in Table 11-7 are accessed through the BDM port by BDM commands,
WDMREG and RDMREG, described in Section 11.5.3.3, “Command Set Descriptions.”
These commands contain a 5-bit field, DRc, that specifies the register, as shown in
Table 11-6.
Table 11-6. BDM/Breakpoint Registers
DRc[4–0]
0x00
0x01–0x05
Register Name
Configuration/status register1
Reserved
Abbreviation
Initial State
Page
CSR
0x0020_0000
p. 11-17
—
—
—
0x04
PC breakpoint ASID control
PBAC
—
p. 11-26
0x05
BDM address attribute register
BAAR
0x0000_0005
p. 11-16
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-11
Freescale Semiconductor, Inc.
Programming Model
Table 11-6. BDM/Breakpoint Registers (Continued)
DRc[4–0]
Freescale Semiconductor, Inc...
Abbreviation
Initial State
Page
0x06
Address attribute trigger register
AATR
0x0000_0005
p. 11-13
0x07
Trigger definition register
TDR
0x0000_0000
p. 11-21
0x08
Program counter breakpoint register
PBR
—
p. 11-20
0x09
Program counter breakpoint mask register
PBMR
—
p. 11-20
—
—
—
0x0A–0x0B
Reserved
0x0C
Address breakpoint high register
ABHR
—
p. 11-15
0x0D
Address breakpoint low register
ABLR
—
p. 11-15
0x0E
Data breakpoint register
DBR
—
p. 11-19
0x0F
Data breakpoint mask register
DBMR
—
p. 11-19
—
—
—
PBASID
—
p. 11-26
—
—
—
0x10–0x13
1
Register Name
Reserved
0x14
PC breakpoint ASID register
0x15
Reserved
0x16
Address attribute trigger register 1
AATR1
0x0000_0005
p. 11-13
0x17
Extended trigger definition register
XTDR
0x0000_0000
p. 11-24
0x18
Program counter breakpoint 1 register
PBR1
0x0000_0000
p. 11-20
0x19
Reserved
—
—
—
0x1A
Program counter breakpoint register 2
PBR2
0x0000_0000
p. 11-20
0x1B
Program counter breakpoint register 3
PBR3
0x0000_0000
p. 11-20
0x1C
Address high breakpoint register 1
ABHR1
—
p. 11-15
0x1D
Address low breakpoint register 1
ABLR1
—
p. 11-15
0x1E
Data breakpoint register 1
DBR1
—
p. 11-19
0x1F
Data breakpoint mask register 1
DBMR1
—
p. 11-19
CSR is write-only from the programming model. It can be read or written through the BDM port using
the RDMREG and WDMREG commands.
These registers are also accessible from the processor’s supervisor programming model
through the execution of the WDEBUG instruction. Thus, the external development
system and the operating system running on the processor core can access the breakpoint
hardware. It is the responsibility of the software to guarantee that all accesses to these
resources are serialized and logically consistent. The hardware provides a locking
mechanism in the CSR to allow the external development system to disable any attempted
writes by the processor to the breakpoint registers (setting IPW = 1). BDM commands
must not be issued if the ColdFire processor is accessing debug module registers with the
WDEBUG instruction or the resulting behavior is undefined.
The ColdFire debug architecture supports a number of hardware breakpoint registers, that
can be configured into single- or double-level triggers based on the PC or operand address
ranges with an optional inclusion of specific data values. With the addition of the MMU
11-12
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model
capabilities, the breakpoint specifications must be expanded to optionally include the
address space identifier (ASID) in these user-programmable virtual address triggers.
The core includes four PC breakpoint triggers and two sets of operand address breakpoint
triggers, each with two independent address registers (to allow specification of a range)
and a data breakpoint with masking capabilities. Core breakpoint triggers are accessible
through the serial BDM interface or written through the supervisor programming model
using the WDEBUG instruction.
Freescale Semiconductor, Inc...
Two ASID-related registers (PBAC and PBASID) are added for the PC breakpoint
qualification, and two existing registers (AATR and AATR1) are expanded for the address
breakpoint qualification.
11.4.1 Revision A Shared Debug Resources
In the Revision A implementation of the debug module, certain hardware structures are
shared between BDM and breakpoint functionality as shown in Table 11-7.
Table 11-7. Rev. A Shared BDM/Breakpoint Hardware
Register
BDM Function
Breakpoint Function
AATR
Bus attributes for all memory commands
Attributes for address breakpoint
ABHR
Address for all memory commands
Address for address breakpoint
DBR
Data for all BDM write commands
Data for data breakpoint
Thus, loading a register to perform a specific function that shares hardware resources is
destructive to the shared function. For example, a BDM command to access memory
overwrites an address breakpoint in ABHR. A BDM write command overwrites the data
breakpoint in DBR.
Revision B added hardware registers to eliminate these shared functions. The BAAR is
used to specify bus attributes for BDM memory commands and has the same format as the
LSB of the AATR. Note that the registers containing the BDM memory address and the
BDM data are not program visible.
11.4.2 Address Attribute Trigger Registers (AATR, AATR1)
The address attribute trigger registers (AATR and AATR1, Figure 11-6), define address
attributes and a mask to be matched in the trigger. The register value is compared with
address attribute signals from the processor’s local high-speed bus, as defined by the
setting of the trigger definition register (TDR) for AATR and the extended trigger
definition register (XTDR) for AATR1.
This register is expanded to include an optional ASID specification and a control bit that
enables the use of the ASID field.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-13
Freescale Semiconductor, Inc.
Programming Model
31
25
Field
—
24
23
16
ASIDCTRL
Reset
ATTRASID
0000_0000_0000_0000
R/W Write only. AATR and AATR1 are accessible in supervisor mode as debug control register 0x06 and 0x16
respectively using the WDEBUG instruction and through the BDM port using the WDMREG command.
15
Field RM
14
13
SZM
12
11
10
8
TTM
TMM
Reset
7
6
R
5
SZ
4
3
2
TT
0
TM
0000_0000_0000_0101
Freescale Semiconductor, Inc...
R/W Write only. AATR and AATR1 are accessible in supervisor mode as debug control register 0x06 and 0x16
respectively using the WDEBUG instruction and through the BDM port using the WDMREG command.
DRc[4–0]
0x06 (AATR); 0x16 (AATR1)
Figure 11-6. Address Attribute Trigger Registers (AATR, AATR1)
Table 11-8 describes AATR and AATR1 fields.
Table 11-8. AATR and AATR1 Field Descriptions
Bits
Name
31–25
—
24
Description
Reserved, should be cleared.
ASIDCTRL ABLR/ABHR/ATTR address breakpoint ASID enable. Corresponds to the ASID control enable
for the address breakpoint defined in ABLR, ABHR, and ATTR.
0 Disable ASID qualifier (reset default)
1 Enable ASID qualifier
23–16 ATTRASID ABLR/ABHR/ATTR ASID. Corresponds to the ASID to be included in the address breakpoint
specified by ABLR, ABHR, and ATTR.
15
RM
Read/write mask. Setting RM masks R in address comparisons.
14–13
SZM
Size mask. Setting an SZM bit masks the corresponding SZ bit in address comparisons.
12–11
TTM
Transfer type mask. Setting a TTM bit masks the corresponding TT bit in address comparisons.
10–8
TMM
Transfer modifier mask. Setting a TMM bit masks the corresponding TM bit in address
comparisons.
7
R
Read/write. R is compared with the R/W signal of the processor’s local bus.
6–5
SZ
Size. Compared to the processor’s local bus size signals.
00 Longword
01 Byte
10 Word
11 Reserved
4–3
TT
Transfer type. Compared with the local bus transfer type signals.
00 Normal processor access
01 Reserved
10 Emulator mode access
11 Acknowledge/CPU space access
These bits also define the TT encoding for BDM memory commands. In this case, the 01
encoding indicates an external or DMA access (for backward compatibility). These bits affect
the TM bits.
11-14
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model
Freescale Semiconductor, Inc...
Table 11-8. AATR and AATR1 Field Descriptions (Continued)
Bits
Name
2–0
TM
Description
Transfer modifier. Compared with the local bus transfer modifier signals, which give
supplemental information for each transfer type.
TT = 00 (normal mode):
000 Data and instruction cache line push
001 User data access
010 User code access
011 Instruction cache invalidate
100 Data cache push/Instruction cache invalidate
101 Supervisor data access
110 Supervisor code access
111 INTOUCH instruction access
TT = 10 (emulator mode):
0xx–100 Reserved
101 Emulator mode data access
110 Emulator mode code access
111 Reserved
TT = 11 (acknowledge/CPU space transfers):
000 CPU space access
001–111 Interrupt acknowledge levels 1–7
These bits also define the TM encoding for BDM memory commands (for backward
compatibility).
11.4.3 Address Breakpoint Registers (ABLR/ABLR1,
ABHR/ABHR1)
The address breakpoint low and high registers (ABLR, ABLR1, ABHR, and ABHR1,
Figure 11-7), define regions in the processor’s data address space that can be used as part
of the trigger. These register values are compared with the address for each transfer on the
processor’s high-speed local bus. The trigger definition register (TDR) identifies the
trigger as one of three cases:
•
•
•
Identically the value in ABLR
Inside the range bound by ABLR and ABHR inclusive
Outside that same range
XTDR determines the same for ABLR1 and ABHR1.
31
0
Field
Address
Reset
—
R/W Write only. ABHR and ABHR1 are accessible in supervisor mode as debug control registers 0x0C and
0x1C, using the WDEBUG instruction and via the BDM port using the RDMREG and WDMREG commands.
ABLR and ABLR1 are accessible in supervisor mode as debug control register 0x0D and 0x1D, using the
WDEBUG instruction and via the BDM port using the WDMREG command.
DRc[4–0]
0x0D (ABLR); 0x1D (ABLR1); 0x0C (ABHR); 0x1C (ABHR1)
Figure 11-7. Address Breakpoint Registers (ABLR, ABHR, ABLR1, ABHR1)
Table 11-9 describes ABLR and ABLR1 fields.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-15
Freescale Semiconductor, Inc.
Programming Model
Table 11-9. ABLR and ABLR1 Field Description
Bits
Name
Description
31–0
Address
Low address. Holds the 32-bit address marking the lower bound of the address breakpoint range.
Breakpoints for specific addresses are programmed into ABLR or ABLR1.
Table 11-10 describes ABHR and ABHR1 fields.
Table 11-10. ABHR and ABHR1 Field Description
Bits
Name
Description
Freescale Semiconductor, Inc...
31–0 Address High address. Holds the 32-bit address marking the upper bound of the address breakpoint range.
11.4.4 BDM Address Attribute Register (BAAR)
The BAAR defines the address space for memory-referencing BDM commands. To
maintain compatibility with Revision A, BAAR is loaded with any data written to the LSB
of AATR. See Figure 11-8. The reset value of 0x5 sets supervisor data as the default
address space.
7
Field
6
5
R
SZ
Reset
4
3
2
TT
0
TM
0000_0101
R/W Write only. BAAR[R,SZ] are loaded directly from the BDM command; BAAR[TT,TM] can be programmed as
debug control register 0x05 from the external development system. For compatibility with Rev. A, BAAR is
loaded each time AATR is written.
DRc[4–0]
0x05
Figure 11-8. BDM Address Attribute Register (BAAR)
Table 11-11 describes BAAR fields.
Table 11-11. BAAR Field Descriptions
Bits
Name
7
R
Read/write
0 Write
1 Read
6–5
SZ
Size
00 Longword
01 Byte
10 Word
11 Reserved
4–3
TT
Transfer type. See the TT definition in Table 11-8.
2–0
TM
Transfer modifier. See the TM definition in Table 11-8.
11-16
Description
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model
11.4.5 Configuration/Status Register (CSR)
The configuration/status register (CSR) defines the debug configuration for the processor
and memory subsystem and contains status information from the breakpoint logic. CSR is
write-only from the programming model. CSR is accessible in supervisor mode as debug
control register 0x00 using the WDEBUG instruction and through the BDM port using the
RDMREG and WDMREG commands. It can be read from and written to through the
BDM port.
Freescale Semiconductor, Inc...
31
28
27
26
25
24
23
20
17
16
HRL
—
BKD PCD IPW
Field
BSTAT
Reset
0000
0
0
0
0
0010
0
0
0
0
R/W1
R
R
R
R
R
R
—
R/W
R/W
R/W
11
10
9
8
3
2
15
14
13
Field MAP TRC EMU
Reset
0
R/W R/W
FOF TRG HALT BKPT
19
12
7
6
5
DDC
UHE
BTB
—
NPL
—
4
SSM OTE
0
0
00
0
00
0
0
0
0
R/W
R/W
R/W
R/W
R/W
R
R/W
—
R/W
DRc[4–0]
0
—
0
—
—
0x00
Figure 11-9. Configuration/Status Register (CSR)
Table 11-12 describes CSR fields.
Table 11-12. CSR Field Descriptions
Bits
31–28
Name
Description
BSTAT Breakpoint status. Provides read-only status information concerning hardware breakpoints. Also
output on PSTDDATA when it is not displaying PST or other processor data. BSTAT is cleared by a
TDR or XTDR write or by a CSR read when either a level-2 breakpoint is triggered or a level-1
breakpoint is triggered and the level-2 breakpoint is disabled.
0000 No breakpoints enabled
0001 Waiting for level-1 breakpoint
0010 Level-1 breakpoint triggered
0101 Waiting for level-2 breakpoint
0110 Level-2 breakpoint triggered
27
FOF
Fault-on-fault. If FOF is set, a catastrophic halt occurred and forced entry into BDM.
26
TRG
Hardware breakpoint trigger. If TRG is set, a hardware breakpoint halted the processor core and
forced entry into BDM. Reset and the debug GO command clear TRG.
25
HALT
Processor halt. If HALT is set, the processor executed a HALT and forced entry into BDM. Reset
and the debug GO command clear HALT.
24
BKPT
Breakpoint assert. If BKPT is set, BKPT is asserted, forcing the processor into BDM. Reset and
the debug GO command clear BKPT.
23–20
HRL
Hardware revision level. Indicates the level of debug module functionality. An emulator could use
this information to identify the level of functionality supported.
0000 Initial debug functionality (Revision A)
0001 Revision B
0010 Revision C
0011 Revision D
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-17
Freescale Semiconductor, Inc.
Programming Model
Freescale Semiconductor, Inc...
Table 11-12. CSR Field Descriptions (Continued)
Bits
Name
19
—
18
BKD
Breakpoint disable. Used to disable the normal BKPT input functionality and to allow the assertion
of BKPT to generate a debug interrupt.
0 Normal operation
1 BKPT is edge-sensitive: a high-to-low edge on BKPT signals a debug interrupt to the processor.
The processor makes this interrupt request pending until the next sample point, when the
exception is initiated. In the ColdFire architecture, the interrupt sample point occurs once per
instruction. There is no support for nesting debug interrupts.
17
PCD
PSTCLK disable. Setting PCD disables generation of PSTCLK and PSTDDATA outputs and forces
them to remain quiescent.
16
IPW
Inhibit processor writes. Setting IPW inhibits processor-initiated writes to the debug module’s
programming model registers. IPW can be modified only by commands from the external
development system.
15
MAP
Force processor references in emulator mode.
0 All emulator-mode references are mapped into supervisor code and data spaces.
1 The processor maps all references while in emulator mode to a special address space, TT = 10,
TM = 101 or 110. The internal SRAM and caches are disabled.
14
TRC
Force emulation mode on trace exception. If TRC = 1, the processor enters emulator mode when a
trace exception occurs. If TRC=0, the processor enters supervisor mode.
13
EMU
Force emulation mode. If EMU = 1, the processor begins executing in emulator mode. See
Section 11.6.1.1, “Emulator Mode.”
12–11
DDC
Debug data control. Controls operand data capture for PSTDDATA, which displays the number of
bytes defined by the operand reference size before the actual data; byte displays 8 bits, word
displays 16 bits, and long displays 32 bits (one nibble at a time across multiple clock cycles). See
Table 11-4.
00 No operand data is displayed.
01 Capture all M-Bus write data.
10 Capture all M-Bus read data.
11 Capture all M-Bus read and write data.
10
UHE
User halt enable. Selects the CPU privilege level required to execute the HALT instruction.
0 HALT is a supervisor-only instruction.
1 HALT is a supervisor/user instruction.
9–8
BTB
Branch target bytes. Defines the number of bytes of branch target address PSTDDATA displays.
00 0 bytes
01 Lower 2 bytes of the target address
10 Lower 3 bytes of the target address
11 Entire 4-byte target address
See Section 11.3.1, “Begin Execution of Taken Branch (PST = 0x5).”
7
—
11-18
Description
Reserved, should be cleared.
Reserved, should be cleared.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model
Freescale Semiconductor, Inc...
Table 11-12. CSR Field Descriptions (Continued)
Bits
Name
Description
6
NPL
Non-pipelined mode. Determines whether the core operates in pipelined or mode.
0 Pipelined mode
1 Non-pipelined mode. The processor effectively executes one instruction at a time with no
overlap. This adds at least 5 cycles to the execution time of each instruction. Superscalar
instruction dispatch is disabled when operating in this mode. Given an average execution latency
of 1.6, throughput in non-pipeline mode would be 6.6, approximately 25% or less of pipelined
performance.
Regardless of the NPL state, a triggered PC breakpoint is always reported before the triggering
instruction executes. In normal pipeline operation, the occurrence of an address or data breakpoint
trigger is imprecise. In non-pipeline mode, triggers are always reported before the next instruction
begins execution and trigger reporting can be considered precise.
An address or data breakpoint should always occur before the next instruction begins execution.
Therefore, the occurrence of the address/data breakpoints should be guaranteed.
5
—
4
SSM
Single-step mode. Setting SSM puts the processor in single-step mode.
0 Normal mode.
1 Single-step mode. The processor halts after execution of each instruction. While halted, any
BDM command can be executed. On receipt of the GO command, the processor executes the
next instruction and halts again. This process continues until SSM is cleared.
3
OTE
Ownership-trace enable.
1 The display of the ASID on the PSTDDATA outputs by entering in user mode, by loading the
ASID by a MOVEC, or by executing a BDM SYNC_PC command.
3–0
—
Reserved, should be cleared.
Reserved, should be cleared.
11.4.6 Data Breakpoint/Mask Registers (DBR/DBR1,
DBMR/DBMR1)
The data breakpoint registers (DBR/DBR1, Figure 11-10), specify data patterns used as
part of the trigger into debug mode. DBRn bits are masked by setting corresponding
DBMR bits, as defined in TDR.
31
0
Field
Data (DBR/DBR1); Mask (DBMR/DBMR1)
Reset
Uninitialized
R/W DBR and DBR1 are accessible in supervisor mode as debug control register 0x0E and 0x1E, using the
WDEBUG instruction and through the BDM port using the RDMREG and WDMREG commands.
DBMR and DBMR1 are accessible in supervisor mode as debug control register 0x0F and 0x1F, using the
WDEBUG instruction and via the BDM port using the WDMREG command.
DRc[4–0]
0x0E (DBR), 0x1E (DBR1); 0x0F (DBMR), 0x1F (DBMR1)
Figure 11-10. Data Breakpoint/Mask Registers (DBR/DBR1 and DBMR/DBMR1)
Table 11-13 describes DBRn fields.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-19
Freescale Semiconductor, Inc.
Programming Model
Table 11-13. DBRn Field Descriptions
Bits
Name
Description
31–0
Data
Data breakpoint value. Contains the value to be compared with the data value from the processor’s
local bus as a breakpoint trigger.
Table 11-14 describes DBMRn fields.
Freescale Semiconductor, Inc...
Table 11-14. DBMRn Field Descriptions
Bits
Name
Description
31–0
Mask
Data breakpoint mask. The 32-bit mask for the data breakpoint trigger. Clearing a DBRn bit allows
the corresponding DBRn bit to be compared to the appropriate bit of the processor’s local data bus.
Setting a DBMRn bit causes that bit to be ignored.
DBRs support both aligned and misaligned references. Table 11-15 shows relationships
between processor address, access size, and location within the 32-bit data bus.
Table 11-15. Access Size and Operand Data Location
A[1:0]
Access Size
Operand Location
00
Byte
D[31:24]
01
Byte
D[23:16]
10
Byte
D[15:8]
11
Byte
D[7:0]
0x
Word
D[31:16]
1x
Word
D[15:0]
xx
Longword
D[31:0]
11.4.7 Program Counter Breakpoint/Mask Registers
(PBR, PBR1, PBR2, PBR3, PBMR)
Each PC breakpoint register (PBR, PBR1, PBR2, PBR3) defines an instruction address for
use as part of the trigger. These registers’ contents are compared with the processor’s
program counter register when the appropriate valid bit is set and TDR or XTDR are
configured appropriately. PBR bits are masked by setting corresponding PBMR bits.
Results are compared with the processor’s program counter register, as defined in TDR or
XTDR. PBR1–PBR3 are not masked. Figure 11-11 shows the PC breakpoint register.
11-20
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model
31
1
0
Field
Program Counter
V1
Reset
—
0
R/W Write. PC breakpoint registers are accessible in supervisor mode using the WDEBUG instruction and
through the BDM port using the RDMREG and WDMREG commands using values shown in Section 11.5.3.3,
“Command Set Descriptions.”
DRc[4–0]
1 PBR
0x08 (PBR); 0x18 (PBR1); 0x1A (PBR2); 0x1B (PBR3)
does not have a valid bit. PBR[0] is read as 0 and should be cleared.
Figure 11-11. Program Counter Breakpoint Registers (PBR, PBR1, PBR2, PBR3)
Freescale Semiconductor, Inc...
Table 11-16 describes PBR, PBR1, PBR2, and PBR3 fields.
Table 11-16. PBR, PBR1, PBR2, PBR3 Field Descriptions
Bits
Name
Description
31–1 Address PC breakpoint address. The 31-bit address to be compared with the PC as a breakpoint trigger.
PBR does not have a valid bit.
0
V
Valid. Breakpoint registers are compared with the processor’s program counter register when the
appropriate valid bit is set and TDR or XTDR are configured appropriately. This bit is not
implemented on PBR.
Figure 11-11 shows PBMR.
31
0
Field
Mask
Reset
—
R/W Write. PBMR is accessible in supervisor mode as debug control register 0x09 using the WDEBUG
instruction and via the BDM port using the WDMREG command.
DRc[4–0]
0x09
Figure 11-12. Program Counter Breakpoint Mask Register (PBMR)
Table 11-17 describes PBMR fields.
Table 11-17. PBMR Field Descriptions
Bits
Name
31–0
Mask
Description
PC breakpoint mask. A zero in a bit position causes the corresponding PBR bit to be compared to
the appropriate PC bit. Set PBMR bits cause PBR bits to be ignored.
11.4.8 Trigger Definition Register (TDR)
The TDR, shown in Table 11-13, configures the operation of the hardware breakpoint
logic that corresponds with the ABHR/ABLR/AATR, PBR/PBR1/PBR2/PBR3/PBMR,
and DBR/DBMR registers within the debug module. In conjunction with the XTDR and
its associated debug registers, TDR controls the actions taken under the defined
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-21
Freescale Semiconductor, Inc.
Programming Model
conditions. Breakpoint logic may be configured as one- or two-level triggers. TDR[31–16]
or XTDR[31–16] define second-level triggers and bits 15–0 define first-level triggers.
NOTE:
The debug module has no hardware interlocks, so to prevent
spurious breakpoint triggers while the breakpoint registers are
being loaded, disable TDR and XTDR (by clearing
TDR[29,13] and XTDR[29,13]) before defining triggers.
A write to TDR clears the CSR trigger status bits, CSR[BSTAT].
Freescale Semiconductor, Inc...
Second-Level Triggers
31
Field
30
TRC
29
EBL
28
27
26
25
24
23
22
EDLW EDWL EDWU EDLL EDLM EDUM EDUU
Reset
21
20
DI
EAI
19
18
17
EAR EAL EPC
16
PCI
All zeros
R/W Write only. Accessible in supervisor mode as debug control register 0x07 using the WDEBUG instruction and
through the BDM port using the WDMREG command.
First-Level Triggers
15
Field
14
—
13
EBL
12
11
10
9
8
7
6
EDLW EDWL EDWU EDLL EDLM EDUM EDUU
Reset
5
4
DI
EAI
3
2
1
EAR EAL EPC
0
PCI
All zeros
R/W Write only. Accessible in supervisor mode as debug control register 0x07 using the WDEBUG instruction and
through the BDM port using the WDMREG command.
DRc[4–0]
0x07
Figure 11-13. Trigger Definition Register (TDR)
Table 11-18 describes TDR fields.
Table 11-18. TDR Field Descriptions
Bits
Name
Description
31–30
TRC
Trigger response control. Determines how the processor responds to a completed trigger condition.
The trigger response is always displayed on PSTDDATA.
00 Display on PSTDDATA only
01 Processor halt
10 Debug interrupt
11 Reserved
15–14
—
29/13
EBL
11-22
Reserved, should be cleared.
Enable breakpoint. Global enable for the breakpoint trigger. Setting TDR[EBL] or XTDR[EBL]
enables a breakpoint trigger. If both TDL[EBL] and XTDL[EBL] are cleared, all breakpoints are
disabled.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Programming Model
Freescale Semiconductor, Inc...
Table 11-18. TDR Field Descriptions (Continued)
Bits
Name
Description
28–22
12–6
EDx
Enable data. Setting an EDx bit enables the corresponding data breakpoint condition based on the
size and placement on the processor’s local data bus. Clearing all EDx bits disables data
breakpoints.
28/12
EDLW
Data longword. Entire processor’s local data bus.
27/11
EDWL
Lower data word.
26/10
EDWU Upper data word.
25/9
EDLL
Lower lower data byte. Low-order byte of the low-order word.
24/8
EDLM
Lower middle data byte. High-order byte of the low-order word.
23/7
EDUM
Upper middle data byte. Low-order byte of the high-order word.
22/6
EDUU
Upper upper data byte. High-order byte of the high-order word.
21/5
DI
20–18/
4–2
EAx
Data breakpoint invert. Provides a way to invert the logical sense of all the data breakpoint
comparators. This can develop a trigger based on the occurrence of a data value other than the
DBR contents.
Enable address bits. Setting an EA bit enables the corresponding address breakpoint. Clearing all
three bits disables the breakpoint.
20/4
EAI
Enable address breakpoint inverted. Breakpoint is based outside the range between
ABLR and ABHR. Trigger if address > ABHR or if address < ABLR.
19/3
EAR
Enable address breakpoint range. The breakpoint is based on the inclusive range defined
by ABLR and ABHR. Trigger if address ≥ ABHR or if address ≤ ABLR.
18/2
EAL
Enable address breakpoint low. The breakpoint is based on the address in the ABLR.
Trigger address = ABLR
17/1
EPC
Enable PC breakpoint. If set, this bit enables the PC breakpoint.
16/0
PCI
Breakpoint invert. If set, this bit allows execution outside a given region as defined by
PBR/PBR1/PBR2/PBR3 and PBMR to enable a trigger. If cleared, the PC breakpoint is defined
within the region defined by PBR/PBR1/PBR2/PBR3 and PBMR.
11.4.9 Extended Trigger Definition Register (XTDR)
The XTDR configures the operation of the hardware breakpoint logic that corresponds
with the ABHR1/ABLR1/AATR1 and DBR1/DBMR1 registers within the debug module
and, in conjunction with the TDR and its associated debug registers, controls the actions
taken under the defined conditions. The breakpoint logic may be configured as a one- or
two-level trigger, where TDR[31–16] or XTDR[31–16] define the second-level trigger and
bits 15–0 define the first-level trigger. The XTDR is accessible in supervisor mode as
debug control register 0x17 using the WDEBUG instruction and via the BDM port using
the WDMREG command.
NOTE:
The debug module has no hardware interlocks, so to prevent
spurious breakpoint triggers while the breakpoint registers are
being loaded, disable TDR and XTDR (by clearing
TDR[29,13] and XTDR[29,13]) before defining triggers.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-23
Freescale Semiconductor, Inc.
Programming Model
A write to the XTDR clears the trigger status bits, CSR[BSTAT].
Section 11.4.9.1, “Resulting Set of Possible Trigger Combinations,” describes how to
handle multiple breakpoint conditions.
Second-Level Triggers
31
30
29
EBL
28
27
26
25
24
23
22
EDLW EDWL EDWU EDLL EDLM EDUM EDUU
21
20
19
18
17 16
DI
EAI
EAR
EAL
—
Field
—
Reset
—
00_0000_0000_000
—
R/W
—
Write
—
Freescale Semiconductor, Inc...
First-Level Triggers
15
14
13
EBL
12
11
10
9
8
7
6
EDLW EDWL EDWU EDLL EDLM EDUM EDUU
5
4
3
2
DI
EAI
EAR
EAL
1
0
Field
—
Reset
—
00_0000_0000_000
—
R/W
—
Write
—
DRc[4–0]
0x17
Figure 11-14. Extended Trigger Definition Register (XTDR)
Table 11-19 describes XTDR fields.
Table 11-19. XTDR Field Descriptions
Bits
Name
Description
29/13
EBL
Enable breakpoint level. If set, EBL is the global enable for the breakpoint trigger; that is, if
TDR[EBL] or XTDR[EBL] is set, a breakpoint trigger is enabled. Clearing both disables all
breakpoints.
28–22
12–6
EDx
Setting an EDx bit enables the corresponding data breakpoint condition based on the size and
placement on the processor’s local data bus. Clearing all EDx bits disables data breakpoints.
28/12
EDLW
Data longword. Entire processor’s local data bus.
27/11
EDWL
Lower data word.
26/10
EDWU Upper data word.
25/9
EDLL
Lower lower data byte. Low-order byte of the low-order word.
24/8
EDLM
Lower middle data byte. High-order byte of the low-order word.
23/7
EDUM Upper middle data byte. Low-order byte of the high-order word.
22/6
EDUU
21/5
11-24
DI
Upper upper data byte. High-order byte of the high-order word.
Data breakpoint invert. Provides a way to invert the logical sense of all the data breakpoint
comparators. This can develop a trigger based on the occurrence of a data value other than the
DBR1 contents.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
—
Freescale Semiconductor, Inc.
Programming Model
Table 11-19. XTDR Field Descriptions (Continued)
Bits
Name
20–18/
4–2
EAx
Enable address bits. Setting an EAx bit enables the corresponding address breakpoint. If all three
bits are cleared, this breakpoint is disabled.
20/4
EAI
Enable address breakpoint inverted. Breakpoint is based outside the range between
ABLR1 and ABHR1. Trigger if address > ABHR or if address < ABLR.
19/3
EAR
Enable address breakpoint range. Breakpoint is based on the range defined between
ABLR1 and ABHR1. Trigger if address ≥ ABHR or if address ≤ ABLR.
18/2
EAL
Enable address breakpoint low. The breakpoint is based on the address in ABLR1. Trigger
if address = ABLR.
17–16,
1–0
Freescale Semiconductor, Inc...
Description
—
Reserved, should be cleared.
11.4.9.1 Resulting Set of Possible Trigger Combinations
The resulting set of possible breakpoint trigger combinations consist of the following
options where || denotes logical OR, && denotes logical AND, and {} denotes an optional
additional trigger term:
One-level triggers of the form:
if
if
if
(PC_breakpoint)
(PC_breakpoint||Address_breakpoint{&& Data_breakpoint})
(PC_breakpoint||Address_breakpoint{&& Data_breakpoint}
||
Address1_breakpoint{&& Data1_breakpoint})
if
if
(Address_breakpoint
{&& Data_breakpoint})
((Address_breakpoint
{&& Data_breakpoint})
||
(Address1_breakpoint{&& Data1_breakpoint}))
if
(Address1_breakpoint
{&& Data1_breakpoint})
Two-level triggers of the form:
if
(PC_breakpoint)
then if
(Address_breakpoint{&& Data_breakpoint})
if
(PC_breakpoint)
then if
(Address_breakpoint{&& Data_breakpoint}
||
Address1_breakpoint{&& Data1_breakpoint})
if
(PC_breakpoint)
then if
(Address1_breakpoint{&& Data1_breakpoint})
if
(Address_breakpoint
{&& Data_breakpoint})
then if
(Address1_breakpoint{&& Data1_breakpoint})
if
(Address1_breakpoint
{&& Data1_breakpoint})
then if
(Address_breakpoint{&& Data_breakpoint})
if
(Address_breakpoint
{&& Data_breakpoint})
then if
(PC_breakpoint)
if
(Address1_breakpoint
{&& Data1_breakpoint})
then if
(PC_breakpoint)
if
(Address_breakpoint
{&& Data_breakpoint})
then if
(PC_breakpoint
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-25
Freescale Semiconductor, Inc.
Programming Model
||
if
Address1_breakpoint{&& Data1_breakpoint})
(Address1_breakpoint
{&& Data1_breakpoint})
then if
(PC_breakpoint
||
Address_breakpoint{&& Data_breakpoint})
Freescale Semiconductor, Inc...
In this example, PC_breakpoint is the logical summation of the PBR/PBMR, PBR1,
PBR2, and PBR3 breakpoint registers; Address_breakpoint is a function of ABHR, ABLR,
and AATR; Data_breakpoint is a function of DBR and DBMR; Address1_breakpoint is a
function of ABHR1, ABLR1, and AATR1; and Data1_breakpoint is a function of DBR1
and DBMR1. In all cases, the data breakpoints can be included with an address breakpoint
to further qualify a trigger event as an option.
11.4.10 PC Breakpoint ASID Control Register (PBAC)
The PBAC configures the breakpoint qualification for each PC breakpoint register (PBR,
PBR1, PBR2, and PBR3). Four bits are dedicated for each breakpoint register and specify
how the ASID is used in PC breakpoint qualification.
15
12
Field
PBR3AC
Reset
11
8
7
PBR2AC
4
PBR1AC
3
0
PBRAC
All zeros
R/W
W
DRc[4–0]
0x0A
Figure 11-15. PC Breakpoint ASID Control Register (PBAC)
PBR3AC, PBR2AC, PBR1AC, and PBRAC apply to PBR3, PBR2, PBR1, and PBR,
respectively, and are functionally identical. They enable or disable ASID, supervisor
mode, and user mode breakpoint qualification. Reset clears these fields, disabling
qualifications and defaulting to the Revision C debug module functionality.
Table 11-20. PBAC Field Descriptions
Bits
Name
Description
15–12
PBR3AC
11–8
PBR2AC
7–4
PBR1AC
3–0
PBRAC
PBRn ASID control. Corresponds to the ASID control associated with PBRn. Determines
whether the ASID is included in the PC breakpoint comparison and whether the operating mode
(supervisor or user) is included in the comparison logic.
x00x No ASID qualification; no mode qualification
x010 No ASID qualification; user mode qualification enabled
x011 No ASID qualification; supervisor mode qualification enabled
x10x ASID qualification enabled; no mode qualification
x110 ASID qualification enabled; user mode qualification enabled
x111 ASID qualification enabled; supervisor mode qualification enabled
11.4.11 PC Breakpoint ASID Register (PBASID)
Each PC breakpoint register (PBR, PBR1, PBR2, or PBR3) specifies an instruction
address that can be used to trigger a breakpoint. To support debugging in a virtual
11-26
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
environment, an ASID can optionally be associated with the instruction address in the PC
breakpoint registers. The optional specification of an ASID value is made using PBASID,
and its exact inclusion within the breakpoint specification defined by the PBAC.
31
Field
24 23
PBR3ASID
16 15
PBR2ASID
7
PBR1ASID
Reset
—
R/W
W
Address
8
0
PBRASID
0x14
Freescale Semiconductor, Inc...
Figure 11-16. PC Breakpoint ASID Register (PBASID)
PBASID contains one 8-bit ASID values for each PC breakpoint register, as described in
Table 11-21, which allows each PC breakpoint register to be associated with a unique
virtual address and process.
Table 11-21. PBASID Field Descriptions
Bits
Name
Description
31–24
PBA3SID
PBR3ASID. Corresponds to the ASID associated with PBR3.
23–16
PBA2SID
PBR2ASID Corresponds to the ASID associated with PBR2.
15–8
PBA1SID
PBR1ASID. Corresponds to the ASID associated with PBR1.
7–0
PBASID
PBRASID. Corresponds to the ASID associated with PBR.
11.5 Background Debug Mode (BDM)
The ColdFire Family implements a low-level system debugger in the microprocessor
hardware. Communication with the development system is handled through a dedicated,
high-speed serial command interface. The ColdFire architecture implements the BDM
controller in a dedicated hardware module. Although some BDM operations, such as CPU
register accesses, require the CPU to be halted, all other BDM commands, such as
memory accesses, can be executed while the processor is running.
BDM is useful for the following reasons:
• In-circuit emulation is not needed, so physical and electrical characteristics of the
system are not affected.
• BDM is always available for debugging the system and provides a communication
link for upgrading firmware in existing systems.
• Provides high-speed cache downloading (500 Kbytes/sec), especially useful for
flash programming
• Provides absolute control of the processor, and thus the system. This feature allows
quick hardware debugging with the same tool set used for firmware development.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-27
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.1 CPU Halt
Freescale Semiconductor, Inc...
Although most BDM operations can occur in parallel with CPU operations, unrestricted
BDM operation requires the CPU to be halted. The sources that can cause the CPU to halt
are listed below in order of priority:
1. A catastrophic fault-on-fault condition automatically halts the processor.
2. A hardware breakpoint can be configured to generate a pending halt condition
similar to the assertion of BKPT. This type of halt is always first made pending in
the processor. Next, the processor samples for pending halt and interrupt conditions
once per instruction. When a pending condition is asserted, the processor halts
execution at the next sample point. See Section 11.6.1, “Theory of Operation.”
3. The execution of a HALT instruction immediately suspends execution. Attempting
to execute HALT in user mode while CSR[UHE] = 0 generates a privilege violation
exception. If CSR[UHE] = 1, HALT can be executed in user mode. After HALT
executes, the processor can be restarted by serial shifting a GO command into the
debug module. Execution continues at the instruction after HALT.
4. The assertion of the BKPT input is treated as a pseudo-interrupt; that is, asserting
BKPT creates a pending halt, which is postponed until the processor core samples
for halts/interrupts. The processor samples for these conditions once during the
execution of each instruction; if a pending halt is detected then, the processor
suspends execution and enters the halted state.
The assertion of BKPT should be considered in the following two special cases:
•
•
11-28
After the system reset signal is negated, the processor waits for 16 processor clock
cycles before beginning reset exception processing. If the BKPT input is asserted
within eight cycles after RSTI is negated, the processor enters the halt state,
signaling halt status (0xF) on the PSTDDATA outputs. While the processor is in
this state, all resources accessible through the debug module can be referenced.
This is the only chance to force the processor into emulation mode through
CSR[EMU].
After system initialization, the processor’s response to the GO command depends
on the set of BDM commands performed while it is halted for a breakpoint.
Specifically, if the PC register was loaded, the GO command causes the processor to
exit halted state and pass control to the instruction address in the PC, bypassing
normal reset exception processing. If the PC was not loaded, the GO command
causes the processor to exit halted state and continue reset exception processing.
The ColdFire architecture also handles a special case of BKPT being asserted while
the processor is stopped by execution of the STOP instruction. For this case, the
processor exits the stopped mode and enters the halted state, at which point, all
BDM commands may be exercised. When restarted, the processor continues by
executing the next sequential instruction, that is, the instruction following the
STOP opcode.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
CSR[27–24] indicates the halt source, showing the highest priority source for multiple halt
conditions. Debug module Revisions A and B clear CSR[27–24] upon a read of the CSR,
but Revision C and D (in V4) do not. The debug GO command clears CSR[26–24].
HALT can be recognized by counting 0xFF occurrences on PSTDDATA. The count is
necessary to determine between a possible data output value of 0xFF and the HALT
condition. Because data always follows a marker (0x8, 0x9, 0xA, or 0xB), PSTDDATA
can display no more than four data 0xFFs. Two such scenarios exist:
Freescale Semiconductor, Inc...
•
•
A B marker occurs on the left nibble of PSTDDATA with the data of 0xFF
following:
PSTDDATA[7:0]
0xBF
0xFF
0xFF
0xFF
0xFX (X indicates that the next PST value is guaranteed to not be 0xF)
A B marker occurs on the right nibble of PSTDDATA with the data of 0xFF
following:
PSTDDATA[7:0]
0xYB
0xFF
0xFF
0xFF
0xFF
0xXY (X indicates that the PST value is guaranteed to not be 0xF; and Y indicates
a PSTDDATA value that doesn’t affect the 0xFF count).
Thus, a count of either nine or more sequential single 0xF values or five or more sequential
0xFF values signifies the HALT condition.
11.5.2 BDM Serial Interface
When the CPU is halted and PSTDDATA reflects the halt status, the development system
can send unrestricted commands to the debug module. The debug module implements a
synchronous protocol using two inputs (DSCLK and DSI) and one output (DSO), where
DSO is specified as a delay relative to the rising edge of the processor clock. See
Table 11-1. The development system serves as the serial communication channel master
and must generate DSCLK.
The serial channel operates at a frequency from DC to 1/5 of the PSTCLK frequency. The
channel uses full-duplex mode, where data is sent and received simultaneously by both
master and slave devices. The transmission consists of 17-bit packets composed of a
status/control bit and a 16-bit data word. As shown in Figure 11-17, all state transitions are
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-29
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
enabled on a rising edge of the PSTCLK clock when DSCLK is high; that is, DSI is
sampled and DSO is driven.
C0
C1
C2
C3
C4
PSTCLK
DSCLK
DSI
BDM State
Machine
Freescale Semiconductor, Inc...
DSO
Current
Next
Current State
Next State
Past
Current
Figure 11-17. Maximum BDM Serial Interface Timing
DSCLK and DSI are synchronized inputs. DSCLK acts as a pseudo clock enable and is
sampled, along with DSI, on the rising edge of PSTCLK. DSO is delayed from the
DSCLK-enabled PSTCLK rising edge (registered after a BDM state machine state
change). All events in the debug module’s serial state machine are based on the PSTCLK
rising edge. DSCLK must also be sampled low (on a positive edge of PSTCLK) between
each bit exchange. The msb is sent first. Because DSO changes state based on an internally
recognized rising edge of DSCLK, DSO cannot be used to indicate the start of a serial
transfer. The development system must count clock cycles in a given transfer. C0–C4 are
described as follows:
•
•
•
•
•
C0: Set the state of the DSI bit.
C1: First synchronization cycle for DSI (DSCLK is high).
C2: Second synchronization cycle for DSI (DSCLK is high).
C3: BDM state machine changes state depending upon DSI and whether the entire
input data transfer has been transmitted.
C4: DSO changes to next value.
NOTE:
A not-ready response can be ignored except during a
memory-referencing cycle. Otherwise, the debug module can
accept a new serial transfer after 32 processor clock periods.
11.5.2.1 Receive Packet Format
The basic receive packet, Figure 11-18, consists of 16 data bits and 1 status bit.
16
S
15
0
Data Field [15:0]
Figure 11-18. Receive BDM Packet
11-30
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
Table 11-22 describes receive BDM packet fields.
Freescale Semiconductor, Inc...
Table 11-22. Receive BDM Packet Field Description
Bits
Name
16
S
15–0
Data
Description
Status. Indicates the status of CPU-generated messages listed below. The not-ready response can
be ignored unless a memory-referencing cycle is in progress. Otherwise, the debug module can
accept a new serial transfer after 32 processor clock periods.
S Data
Message
0 xxxx
Valid data transfer
0 0xFFFF Status OK
1 0x0000 Not ready with response; come again
1 0x0001 Error: Terminated bus cycle; data invalid
1 0xFFFF Illegal command
Data. Contains the message to be sent from the debug module to the development system. The
response message is always a single word, with the data field encoded as shown above.
11.5.2.2 Transmit Packet Format
The basic transmit packet, Figure 11-19, consists of 16 data bits and 1 control bit.
16
15
C
0
D[15:0]
Figure 11-19. Transmit BDM Packet
Table 11-23 describes transmit BDM packet fields.
Table 11-23. Transmit BDM Packet Field Description
Bits
Name
16
C
15–0
Data
Description
Control. This bit is reserved. Command and data transfers initiated by the development system
should clear C.
Contains the data to be sent from the development system to the debug module.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-31
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3 BDM Command Set
Table 11-24 summarizes the BDM command set. Subsequent paragraphs contain detailed
descriptions of each command. Issuing a BDM command when the processor is accessing
debug module registers using the WDEBUG instruction causes undefined behavior.
Freescale Semiconductor, Inc...
Table 11-24. BDM Command Summary
Description
CPU
State1
Section
Command
(Hex)
Command
Mnemonic
Read A/D
register
rareg/
rdreg
Read the selected address or data register and
return the results through the serial interface.
Halted
11.5.3.3.1 0x218 {A/D,
Reg[2:0]}
Write A/D
register
wareg/
wdreg
Write the data operand to the specified address or
data register.
Halted
11.5.3.3.2 0x208 {A/D,
Reg[2:0]}
Read
memory
location
read
Read the data at the memory location specified by
the longword address.
Steal
11.5.3.3.3 0x1900—byte
0x1940—word
0x1980—lword
Write
memory
location
write
Write the operand data to the memory location
specified by the longword address.
Steal
11.5.3.3.4 0x1800—byte
0x1840—word
0x1880—lword
Dump
memory
block
dump
Used with READ to dump large blocks of memory.
An initial READ is executed to set up the starting
address of the block and to retrieve the first result.
A DUMP command retrieves subsequent operands.
Steal
11.5.3.3.5 0x1D00—byte
0x1D40—word
0x1D80—lword
Fill memory
block
fill
Used with WRITE to fill large blocks of memory. An
initial WRITE is executed to set up the starting
address of the block and to supply the first operand.
A FILL command writes subsequent operands.
Steal
11.5.3.3.6 0x1C00—byte
0x1C40—word
0x1C80—lword
Resume
execution
go
The pipeline is flushed and refilled before resuming
instruction execution at the current PC.
Halted
11.5.3.3.7 0x0C00
No operation
nop
Perform no operation; may be used as a null
command.
Parallel
11.5.3.3.8 0x0000
Output the
current PC
sync_pc
Capture the current PC and display it on the
PSTDDATA output pins.
Parallel
11.5.3.3.9 0x0001
Read control
register
rcreg
Read the system control register.
Halted
11.5.3.3.1 0x2980
1
Write control
register
wcreg
Write the operand data to the system control
register.
Halted
11.5.3.3.1 0x2880
2
Read debug
module
register
rdmreg
Read the debug module register.
Parallel
11.5.3.3.1 0x2D {0x42
3
DRc[4:0]}
Write debug
module
register
wdmreg
Write the operand data to the debug module
register.
Parallel
11.5.3.3.1 0x2C {0x42
4
DRc[4:0]}
1
General command effect and/or requirements on CPU operation:
- Halted. The CPU must be halted to perform this command.
- Steal. Command generates bus cycles that can be interleaved with bus accesses.
- Parallel. Command is executed in parallel with CPU activity.
2 0x4 is a three-bit field.
Unassigned command opcodes are reserved by Motorola. All unused command formats
within any revision level perform a NOP and return the illegal command response.
11-32
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.1 ColdFire BDM Command Format
All ColdFire Family BDM commands include a 16-bit operation word followed by an
optional set of one or more extension words, as shown in Figure 11-20.
15
10
Operation
9
8
0
R/W
7
6
5
4
3
Op Size
0
0
A/D
2
0
Register
Extension Word(s)
Figure 11-20. BDM Command Format
Table 11-25 describes BDM fields.
Freescale Semiconductor, Inc...
Table 11-25. BDM Field Descriptions
Bit
15–10
Name
Description
Operation Specifies the command. These values are listed in Table 11-24.
9
0
Reserved
8
R/W
7–6
Operand
Size
5–4
00
Reserved
3
A/D
Address/data. Determines whether the register field specifies a data or address register.
0 Indicates a data register.
1 Indicates an address register.
2–0
Register
Direction of operand transfer.
0 Data is written to the CPU or to memory from the development system.
1 The transfer is from the CPU to the development system.
Operand data size for sized operations. Addresses are expressed as 32-bit absolute values.
Note that a command performing a byte-sized memory read leaves the upper 8 bits of the
response data undefined. Referenced data is returned in the lower 8 bits of the response.
Operand Size Bit Values
00 Byte
8 bits
01 Word
16 bits
10 Longword
32 bits
11 Reserved
—
Contains the register number in commands that operate on processor registers.
11.5.3.1.1 Extension Words as Required
Some commands require extension words for addresses or immediate data. Addresses
require two extension words because only absolute long addressing is permitted.
Longword accesses are forcibly longword-aligned and word accesses are forcibly
word-aligned. Immediate data can be 1 or 2 words long. Byte and word data each requires
a single extension word and longword data requires two extension words.
Operands and addresses are transferred most-significant word first. In the following
descriptions of the BDM command set, the optional set of extension words is defined as
address, data, or operand data.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-33
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.2 Command Sequence Diagrams
The command sequence diagram in Figure 11-21 shows serial bus traffic for commands.
Each bubble represents a 17-bit bus transfer. The top half of each bubble indicates the data
the development system sends to the debug module; the bottom half indicates the debug
module’s response to the previous development system commands. Command and result
transactions overlap to minimize latency.
COMMANDS TRANSMITTED TO THE DEBUG MODULE
COMMAND CODE TRANSMITTED DURING THIS CYCLE
Freescale Semiconductor, Inc...
HIGH-ORDER 16 BITS OF MEMORY ADDRESS
LOW-ORDER 16 BITS OF MEMORY ADDRESS
NONSERIAL-RELATED ACTIVITY
SEQUENCE TAKEN IF
OPERATION HAS NOT
COMPLETED
READ (LONG)
???
MS ADDR
"NOT READY"
LS ADDR
"NOT READY"
XXX
"ILLEGAL"
NEXT CMD
"NOT READY"
READ
MEMORY
LOCATION
XXX
"NOT READY"
XXXXX
XXX
MS RESULT
XXX
BERR
NEXT
COMMAND
CODE
NEXT CMD
LS RESULT
NEXT CMD
"NOT READY"
DATA UNUSED FROM
THIS TRANSFER
SEQUENCE TAKEN IF
ILLEGAL COMMAND
IS RECEIVED BY DEBUG MODULE
RESULTS FROM PREVIOUS COMMAND
SEQUENCE TAKEN IF BUS
ERROR OCCURS ON
MEMORY ACCESS
HIGH- AND LOW-ORDER
16 BITS OF RESULT
RESPONSES FROM THE DEBUG MODULE
Figure 11-21. Command Sequence Diagram
The sequence is as follows:
•
•
11-34
In cycle 1, the development system command is issued (READ in this example). The
debug module responds with either the low-order results of the previous command
or a command complete status of the previous command, if no results are required.
In cycle 2, the development system supplies the high-order 16 address bits. The
debug module returns a not-ready response unless the received command is
decoded as unimplemented, which is indicated by the illegal command encoding. If
this occurs, the development system should retransmit the command.
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
NOTE:
A not-ready response can be ignored except during a
memory-referencing cycle. Otherwise, the debug module can
accept a new serial transfer after 32 processor clock periods.
•
•
Freescale Semiconductor, Inc...
•
In cycle 3, the development system supplies the low-order 16 address bits. The
debug module always returns a not-ready response.
At the completion of cycle 3, the debug module initiates a memory read operation.
Any serial transfers that begin during a memory access return a not-ready response.
Results are returned in the two serial transfer cycles after the memory access
completes. For any command performing a byte-sized memory read operation, the
upper 8 bits of the response data are undefined and the referenced data is returned
in the lower 8 bits. The next command’s opcode is sent to the debug module during
the final transfer. If a memory or register access is terminated with a bus error, the
error status (S = 1, DATA = 0x0001) is returned instead of result data.
11.5.3.3 Command Set Descriptions
The following sections describe the commands summarized in Table 11-24.
NOTE:
The BDM status bit (S) is 0 for normally completed
commands; S = 1 for illegal commands, not-ready responses,
and transfers with bus-errors. Section 11.5.2, “BDM Serial
Interface,” describes the receive packet format.
Motorola reserves unassigned command opcodes for future expansion. Unused command
formats in any revision level perform a NOP and return an illegal command response.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-35
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.1 Read A/D Register (RAREG/RDREG)
Read the selected address or data register and return the 32-bit result. A bus error response
is returned if the CPU core is not halted.
Command/Result Formats:
15
Command
12
11
0x2
8
7
0x1
Result
4
0x8
3
A/D
2
0
Register
D[31:16]
D[15:0]
Freescale Semiconductor, Inc...
Figure 11-22. RAREG/RDREG Command Format
Command Sequence:
RAREG/RDREG
???
XXX
MS RESULT
XXX
BERR
NEXT CMD
LS RESULT
NEXT CMD
"NOT READY"
Figure 11-23. RAREG/RDREG Command Sequence
Operand Data:
None
Result Data:
The contents of the selected register are returned as a longword
value, most-significant word first.
11-36
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.2 Write A/D Register (WAREG/WDREG)
The operand longword data is written to the specified address or data register. A write
alters all 32 register bits. A bus error response is returned if the CPU core is not halted.
Command Format:
15
12
11
0x2
8
7
0x0
4
0x8
3
A/D
2
0
Register
D[31:16]
D[15:0]
Freescale Semiconductor, Inc...
Figure 11-24. WAREG/WDREG Command Format
Command Sequence
WDREG/WAREG
???
MS DATA
"NOT READY"
LS DATA
"NOT READY"
XXX
BERR
NEXT CMD
"NOT READY"
NEXT CMD
"CMD COMPLETE"
Figure 11-25. WAREG/WDREG Command Sequence
Operand Data
Longword data is written into the specified address or data register.
The data is supplied most-significant word first.
Result Data
Command complete status is indicated by returning 0xFFFF (with S
cleared) when the register write is complete.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-37
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.3 Read Memory Location (READ)
Read data at the longword address. Address space is defined by BAAR[TT,TM]. Hardware
forces low-order address bits to zeros for word and longword accesses to ensure that word
addresses are word-aligned and longword addresses are longword-aligned.
Command/Result Formats:
15
12
Byte
11
8
0x1
7
0x9
4
3
0x0
Command
0
0x0
A[31:16]
Freescale Semiconductor, Inc...
A[15:0]
Result
Word
X
Command
X
X
X
X
0x1
X
X
X
0x9
D[7:0]
0x4
0x0
0x8
0x0
A[31:16]
A[15:0]
Result
D[15:0]
Longword Command
0x1
0x9
A[31:16]
A[15:0]
Result
D[31:16]
D[15:0]
Figure 11-26. READ Command/Result Formats
Command Sequence:
READ (B/W)
???
MS ADDR
"NOT READY"
LS ADDR
"NOT READY"
READ
MEMORY
LOCATION
XXX
"NOT READY"
XXXCMD
NEXT
RESULT
XXX
BERR
READ (LONG)
???
MS ADDR
"NOT READY"
LS ADDR
"NOT READY"
READ
MEMORY
LOCATION
NEXT CMD
"NOT READY"
XXX
"NOT READY"
XXX
XXX
MS RESULT
NEXT CMD
LS RESULT
XXX
BERR
NEXT CMD
"NOT READY"
Figure 11-27. READ Command Sequence
Operand Data
The only operand is the longword address of the requested location.
Result Data
Word results return 16 bits of data; longword results return 32.
Bytes are returned in the LSB of a word result, the upper byte is
undefined. 0x0001 (S = 1) is returned if a bus error occurs.
11-38
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.4 Write Memory Location (WRITE)
Write data to the memory location specified by the longword address. The address space is
defined by BAAR[TT,TM]. Hardware forces low-order address bits to zeros for word and
longword accesses to ensure that word addresses are word-aligned and longword
addresses are longword-aligned.
Command Formats:
15
12
Byte
11
8
0x1
0x8
7
4
0x0
3
1
0x0
Freescale Semiconductor, Inc...
A[31:16]
A[15:0]
X
Word
X
X
0x1
X
X
X
X
X
0x8
D[7:0]
0x4
0x0
0x8
0x0
A[31:16]
A[15:0]
D[15:0]
Longword
0x1
0x8
A[31:16]
A[15:0]
D[31:16]
D[15:0]
Figure 11-28. WRITE Command Format
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-39
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
Command Sequence:
WRITE (B/W)
???
MS ADDR
"NOT READY"
LS ADDR
"NOT READY"
DATA
"NOT READY"
WRITE
MEMORY
LOCATION
XXX
"NOT READY"
XXX CMD
NEXT
"CMD COMPLETE"
XXX
BERR
Freescale Semiconductor, Inc...
NEXT CMD
"NOT READY"
WRITE (LONG)
???
MS ADDR
"NOT READY"
LS ADDR
"NOT READY"
MS DATA
"NOT READY"
LS DATA
"NOT READY"
WRITE
MEMORY
LOCATION
XXX
"NOT READY"
XXX CMD
NEXT
"CMD COMPLETE"
XXX
BERR
NEXT CMD
"NOT READY"
Figure 11-29. WRITE Command Sequence
Operand Data
This two-operand instruction requires a longword absolute address
that specifies a location to which the data operand is to be written.
Byte data is sent as a 16-bit word, justified in the LSB; 16- and
32-bit operands are sent as 16 and 32 bits, respectively
Result Data
Command complete status is indicated by returning 0xFFFF (with S
cleared) when the register write is complete. A value of 0x0001
(with S set) is returned if a bus error occurs.
11-40
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.5 Dump Memory Block (DUMP)
is used with the READ command to access large blocks of memory. An initial READ
is executed to set up the starting address of the block and to retrieve the first result. If an
initial READ is not executed before the first DUMP, an illegal command response is
returned. The DUMP command retrieves subsequent operands. The initial address is
incremented by the operand size (1, 2, or 4) and saved in a temporary register. Subsequent
DUMP commands use this address, perform the memory read, increment it by the current
operand size, and store the updated address in the temporary register.
Freescale Semiconductor, Inc...
DUMP
NOTE:
DUMP does not check for a valid address; it is a valid command
only when preceded by NOP , READ , or another DUMP
command. Otherwise, an illegal command response is
returned. NOP can be used for intercommand padding without
corrupting the address pointer.
The size field is examined each time a DUMP command is processed, allowing the operand
size to be dynamically altered.
Command/Result Formats:
15
Byte
Command
Result
Word
12
Command
11
8
0x1
X
X
0xD
X
X
X
0x1
X
4
3
0x0
X
X
0xD
Result
Longword Command
7
0
0x0
D[7:0]
0x4
0x0
0x8
0x0
D[15:0]
0x1
0xD
Result
D[31:16]
D[15:0]
Figure 11-30.
DUMP
Command/Result Formats
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-41
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
Command Sequence:
READ
MEMORY
LOCATION
DUMP (B/W)
???
XXX
"NOT READY"
NEXT CMD
RESULT
XXX
"ILLEGAL"
READ
MEMORY
LOCATION
DUMP (LONG)
???
Freescale Semiconductor, Inc...
NEXT CMD
"NOT READY"
XXX
"ILLEGAL"
XXX
BERR
NEXT CMD
"NOT READY"
XXX
"NOT READY"
NEXT CMD
"NOT READY"
NEXT CMD
MS RESULT
NEXT CMD
LS RESULT
XXX
BERR
NEXT CMD
"NOT READY"
Figure 11-31. DUMP Command Sequence
Operand Data:
None
Result Data:
Requested data is returned as either a word or longword. Byte data
is returned in the least-significant byte of a word result. Word results
return 16 bits of significant data; longword results return 32 bits. A
value of 0x0001 (with S set) is returned if a bus error occurs.
11-42
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.6 Fill Memory Block (FILL)
A FILL command is used with the WRITE command to access large blocks of memory. An
initial WRITE is executed to set up the starting address of the block and to supply the first
operand. The FILL command writes subsequent operands. The initial address is
incremented by the operand size (1, 2, or 4) and saved in a temporary register after the
memory write. Subsequent FILL commands use this address, perform the write, increment
it by the current operand size, and store the updated address in the temporary register.
Freescale Semiconductor, Inc...
If an initial WRITE is not executed preceding the first FILL command, the illegal command
response is returned.
NOTE:
The FILL command does not check for a valid address: FILL is
a valid command only when preceded by another FILL, a NOP,
or a WRITE command. Otherwise, an illegal command
response is returned. The NOP command can be used for
intercommand padding without corrupting the address pointer.
The size field is examined each time a FILL command is processed, allowing the operand
size to be altered dynamically.
Command Formats:
15
12
Byte
8
0x1
X
Word
11
X
7
0xC
X
X
X
X
0x1
4
3
0x0
X
X
0xC
0
0x0
D[7:0]
0x4
0x0
0x8
0x0
D[15:0]
Longword
0x1
0xC
D[31:16]
D[15:0]
Figure 11-32.
FILL
Command Format
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-43
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
Command Sequence:
FILL
FILL(LONG)
(B/W)
???
MS DATA
"NOT READY"
XXX
"ILLEGAL"
LS DATA
"NOT READY"
WRITE
MEMORY
LOCATION
XXX
"NOT READY"
NEXT CMD
"CMD COMPLETE"
NEXT CMD
"NOT READY"
XXX
BERR
Freescale Semiconductor, Inc...
FILL(LONG)
(B/W)
FILL
???
DATA
"NOT READY"
WRITE
MEMORY
LOCATION
XXX
"ILLEGAL"
NEXT CMD
"NOT READY"
NEXT CMD
"NOT READY"
XXX
"NOT READY"
NEXT CMD
"CMD COMPLETE"
XXX
BERR
NEXT CMD
"NOT READY"
Figure 11-33. FILL Command Sequence
Operand Data:
A single operand is data to be written to the memory location. Byte
data is sent as a 16-bit word, justified in the least-significant byte;
16- and 32-bit operands are sent as 16 and 32 bits, respectively.
Result Data:
Command complete status (0xFFFF) is returned when the register
write is complete. A value of 0x0001 (with S set) is returned if a bus
error occurs.
11-44
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.7 Resume Execution (GO)
The pipeline is flushed and refilled before normal instruction execution resumes.
Prefetching begins at the current address in the PC and at the current privilege level. If any
register (such as the PC or SR) is altered by a BDM command while the processor is
halted, the updated value is used when prefetching resumes. If a GO command is issued
and the CPU is not halted, the command is ignored.
15
12
11
8
0x0
0xC
7
4
0x0
3
0
0x0
Freescale Semiconductor, Inc...
Figure 11-34. GO Command Format
Command Sequence:
GO
???
NEXT CMD
"CMD COMPLETE"
Figure 11-35. GO Command Sequence
Operand Data:
None
Result Data:
The command-complete response (0xFFFF) is returned during the
next shift operation.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-45
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.8 No Operation (NOP)
NOP
performs no operation and may be used as a null command where required.
Command Formats:
15
12
11
0x0
8
0x0
7
4
0x0
3
0
0x0
Figure 11-36. NOP Command Format
Freescale Semiconductor, Inc...
Command Sequence:
NOP
???
NEXT CMD
"CMD COMPLETE"
Figure 11-37. NOP Command Sequence
Operand Data:
None
Result Data:
The command-complete response, 0xFFFF (with S cleared), is
returned during the next shift operation.
11-46
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.9 Synchronize PC to the PSTDDATA Lines (SYNC_PC)
Freescale Semiconductor, Inc...
The SYNC_PC command captures the current PC and displays it on the PSTDDATA
outputs. After the debug module receives the command, it sends a signal to the ColdFire
processor that the current PC must be displayed. The processor then forces an instruction
fetch at the next PC with the address being captured in the DDATA logic under control of
CSR[BTB]. The specific sequence of PSTDDATA values is as follows:
1. Debug signals a SYNC_PC command is pending.
2. CPU completes the current instruction.
3. CPU forces an instruction fetch to the next PC, generates a PST = 0x5 value
indicating a taken branch and signals the capture of DDATA.
4. The instruction address corresponding to the PC is captured.
5. The PST marker (0x9–0xB) is generated and displayed as defined by CSR[BTB]
followed by the captured PC address.
If the option to display ASID is enabled (CSR[3] = 1), the 8-bit ASID follows the address.
That is, the PSTDDATA sequence is {0x5, Marker, Instruction Address, 0x8, ASID},
where the 0x8 is the marker for the ASID.
The SYNC_PC command can be used to dynamically access the PC for performance
monitoring. The execution of this command is considerably less obtrusive to the real-time
operation of an application than a HALT-CPU/READ-PC/RESUME command sequence.
Command Formats:
15
12
11
0x0
8
0x0
7
4
0x0
3
0
0x1
Figure 11-38. SYNC_PC Command Format
Command Sequence:
SYNC_PC
NOP
???
NEXT CMD
"CMD COMPLETE"
Figure 11-39. SYNC_PC Command Sequence
Operand Data:
None
Result Data:
Command complete status (0xFFFF) is returned when the register
write is complete.
11.5.3.3.10 Force Transfer Acknowledge (FORCE_TA)
DEBUG_D logic implements the new FORCE_TA serial BDM command to resolve a hung
bus condition. In some system designs, references to certain unmapped memory addresses
may cause the external bus to hang with no transfer acknowledge generated by any bus
responders. The FORCE_TA forces generation of a transfer acknowledge signal, which can
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-47
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
be logically summed into the normal acknowledge logic located in the system integration
module (SIM) outside of the ColdFire core.
There are two scenarios of interest, one caused by a processor access and the other caused
by a BDM access. The following sequences identify the operations needed to break the
hung bus condition:
Freescale Semiconductor, Inc...
•
•
Bus hang caused by processor or external or internal alternate master:
— Assert the breakpoint input to force a processor core halt.
— If the bus hang was caused by a processor access, send in FORCE_TA commands
until the processor is halted, as signaled by PST = 0xF. Due to pipeline and
store buffer depths, many memory accesses may be queued up behind the
access causing the bus hang. Repeated FORCE_TA commands eventually allow
processing of all these pending accesses. As soon as the processor is halted, the
system reaches a quiescent, controllable state.
— If the hang was caused by another master, such as a DMA channel, the
processor can halt immediately. In this case as well, multiple assertions of the
FORCE_TA command may be required to terminate the alternate master’s errant
access.
Bus hang caused by BDM access:
— It is assumed the processor is already halted at the time of the errant BDM
access. To resolve the hung bus, it is necessary to process four or more
FORCE_TA commands because the BDM command may have initiated a cache
line access, which fetches 4 longwords, each needing a unique transfer
acknowledge.
Formats:
15
12
11
8
0x0
0x0
7
4
0x0
3
0
0x2
Figure 11-40. FORCE_TA Command
Command Sequence:
FORCE_TA
???
NEXT CMD
“CMD COMPLETE”
Figure 11-41. FORCE_TA Command Sequence
Operand Data:
None
Result Data:
The command complete response, 0xFFFF (with the status bit
cleared), is returned during the next shift operation. This response
indicates the FORCE_TA command was processed correctly and does
not necessarily reflect the status of any internal bus.
11-48
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.11 Read Control Register (RCREG)
Read the selected control register and return the 32-bit result. Accesses to the
processor/memory control registers are always 32 bits wide, regardless of register width.
The second and third words of the command form a 32-bit address, which the debug
module uses to generate a special bus cycle to access the specified control register. The
12-bit Rc field is the same as that used by the MOVEC instruction.
Command/Result Formats:
15
Freescale Semiconductor, Inc...
Command
12
11
8
7
4
3
0
0x2
0x9
0x8
0x0
0x0
0x0
0x0
0x0
0x0
Rc
Result
D[31:16]
D[15:0]
Figure 11-42. RCREG Command/Result Formats
Rc encoding: See Table 2-4.
Command Sequence:
RCREG
???
MS ADDR
EXT
WORD
"NOT READY"
MS ADDR
EXT
WORD
"NOT READY"
READ
CONTROL
MEMORY
REGISTER
LOCATION
XXX
"NOT READY"
XXX
MS RESULT
NEXT CMD
LS RESULT
XXX
BERR
NEXT CMD
"NOT READY"
Figure 11-43. RCREG Command Sequence
Operand Data:
The only operand is the 32-bit Rc control register select field.
Result Data:
Control register contents are returned as a longword,
most-significant word first. The implemented portion of registers
smaller than 32 bits is guaranteed correct; other bits are undefined.
BDM Accesses of the Stack Pointer Registers (A7: SSP and USP)
The Version 4 ColdFire core supports two unique stack pointer (A7) registers: the
supervisor stack pointer (SSP) and the user stack pointer (USP). The hardware
implementation of these two programmable-visible 32-bit registers does not uniquely
identify one as the SSP and the other as the USP. Rather, the hardware uses one 32-bit
register as the currently-active A7; the other is named simply the OTHER_A7. Thus, the
contents of the two hardware registers is a function of the operating mode of the processor:
if SR[S] = 1
then
A7 = Supervisor Stack Pointer
OTHER_A7 = User Stack Pointer
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-49
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
else
A7 = User Stack Pointer
OTHER_A7 = Supervisor Stack Pointer
The BDM programming model supports reads and writes to A7 and OTHER_A7 directly.
It is the responsibility of the external development system to determine the mapping of A7
and OTHER_A7 to the two program-visible definitions (supervisor and user stack
pointers), based on the SR[S].
BDM Accesses of the EMAC Registers
Freescale Semiconductor, Inc...
The presence of rounding logic in the output datapath of the EMAC requires special care
for BDM-initiated reads and writes of its programming model. In particular, any result
rounding modes must be disabled during the read/write process so the exact bit-wise
EMAC register contents are accessed.
For example, a BDM read of an accumulator (ACCx) requires the following sequence:
BdmReadACCx (
rcreg
wcreg
rcreg
wcreg
)
macsr;
#0,macsr;
ACCx;
#saved_data,macsr;
//
//
//
//
read current macsr contents & save
disable all rounding modes
read the desired accumulator
restore the original macsr
Likewise to write an accumulator register, the following BDM sequence is needed:
BdmWriteACCx (
rcreg
wcreg
wcreg
wcreg
)
macsr;
#0,macsr;
#data,ACCx;
#saved_data,macsr;
//
//
//
//
read current macsr contents & save
disable all rounding modes
write the desired accumulator
restore the original macsr
Additionally, writes to the accumulator extension registers must be performed after the
corresponding accumulators are updated because a write to any accumulator alters the
corresponding extension register contents.
For more information on saving and restoring the complete EMAC programming model,
see the appropriate section of the EMAC chapter.
BDM Accesses of Floating-Point Data Registers (FPn)
The ColdFire debug architecture allows BDM accesses of the entire programming model
(including all FPU-related registers) of the processor core using RCREG and WCREG.
However, certain hardware restrictions require the accesses related to the 64-bit FPn data
registers be performed in a certain manner to guarantee correct operation.
The serial BDM command structure supports 8-, 16- and 32-bit accesses, but there is no
direct mechanism for accessing 64-bit data values. Rather than changing this
well-established protocol and command set, BDM accesses of 64-bit data values are
treated as two independent 32-bit references. In particular, 64-bit FPn data registers are
treated as two separate values from the BDM perspective. Each FPn is partitioned into
upper and lower longwords, FPUn and FPLn.
11-50
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
Either longword can be read first. The processor treats the BDM read command as a
pseudo-FMOVEM. Accordingly, all rounding modes and exception enables are ignored
and the 32-bit contents of FPUn or FPLn are sent to the debug module for transmission
over the serial communication channel. The FPU programming model is unchanged.
Freescale Semiconductor, Inc...
To write to an FPU data register, FPUn must be written first and followed by a write to
FPLn. The processor operates as follows: The BDM write to FPUn is performed loads the
upper 32 bits of an internal double-precision operand register. The BDM write to FPLn
loads the supplied operand into the lower 32 bits of the same internal register, after which
the entire 64-bit value is loaded into the selected FPn. Failure to execute this sequence of
commands produces an undefined value in the FPUn.
Note that any BDM write of an FPU register changes the internal state from NULL to
IDLE.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-51
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.12 Write Control Register (WCREG)
The operand (longword) data is written to the specified control register. The write alters all
32 register bits. See the RCREG instruction description for the Rc encoding and for
additional notes on writes to the A7 stack pointers and the EMAC and FPU programming
models.
Command/Result Formats:
15
Command
12
11
8
7
3
0
0x2
0x8
0x8
0x0
0x0
0x0
0x0
0x0
0x0
Freescale Semiconductor, Inc...
4
Rc
Result
D[31:16]
D[15:0]
Figure 11-44. WCREG Command/Result Formats
Command Sequence:
WCREG
???
MS ADDR
EXT
WORD
"NOT READY"
MS ADDR
EXT
WORD
"NOT READY"
MS DATA
"NOT READY"
LS DATA
"NOT READY"
WRITE
WRITE
CONTROL
MEMORY
REGISTER
LOCATION
XXX
"NOT READY"
XXX CMD
NEXT
"CMD COMPLETE"
XXX
BERR
NEXT CMD
"NOT READY"
Figure 11-45. WCREG Command Sequence
Operand Data:
This instruction requires two longword operands. The first selects
the register to which the operand data is to be written; the second
contains the data.
Result Data:
Successful write operations return 0xFFFF. Bus errors on the write
cycle are indicated by the setting of bit 16 in the status message and
by a data pattern of 0x0001.
11-52
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.13 Read Debug Module Register (RDMREG)
Read the selected debug module register and return the 32-bit result. The only valid
register selection for the RDMREG command is CSR (DRc = 0x00). Note that this read of
the CSR clears the trigger status bits (CSR[BSTAT]) if either a level-2 breakpoint has been
triggered or a level-1 breakpoint has been triggered and no level-2 breakpoint has been
enabled.
Command/Result Formats:
15
Command
12
11
0x2
8
5
0xD
4
0
100
Result
Freescale Semiconductor, Inc...
7
DRc
D[31:16]
D[15:0]
Figure 11-46.
RDMREG BDM
Command/Result Formats
Table 11-26 shows the definition of DRc encoding.
Table 11-26. Definition of DRc Encoding—Read
DRc[4:0]
Debug Register Definition
Mnemonic
Initial State
Page
0x00
Configuration/Status
CSR
0x0
p. 11-17
0x01–0x1F
Reserved
—
—
—
Command Sequence:
RDMREG
???
XXX
MS RESULT
NEXT CMD
LS RESULT
XXX
"ILLEGAL"
NEXT CMD
"NOT READY"
Figure 11-47. RDMREG Command Sequence
Operand Data:
None
Result Data:
The contents of the selected debug register are returned as a
longword value. The data is returned most-significant word first.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-53
Freescale Semiconductor, Inc.
Background Debug Mode (BDM)
11.5.3.3.14 Write Debug Module Register (WDMREG)
The operand (longword) data is written to the specified debug module register. All 32 bits
of the register are altered by the write. DSCLK must be inactive while the debug module
register writes from the CPU accesses are performed using the WDEBUG instruction.
Command Format:
Figure 11-48. WDMREG BDM Command Format
15
12
11
0x2
8
7
0xC
5
4
100
0
DRc
D[31:16]
Freescale Semiconductor, Inc...
D[15:0]
Table 11-6 shows the definition of the DRc write encoding.
Command Sequence:
WDMREG
???
MS DATA
"NOT READY"
LS DATA
"NOT READY"
XXX
"ILLEGAL"
NEXT CMD
"NOT READY"
NEXT CMD
"CMD COMPLETE"
Figure 11-49. WDMREG Command Sequence
Operand Data:
Longword data is written into the specified debug register. The data
is supplied most-significant word first.
Result Data:
Command complete status (0xFFFF) is returned when register write
is complete.
11-54
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Real-Time Debug Support
11.6 Real-Time Debug Support
Freescale Semiconductor, Inc...
The ColdFire Family provides support debugging real-time applications. For these types
of embedded systems, the processor must continue to operate during debug. The
foundation of this area of debug support is that while the processor cannot be halted to
allow debugging, the system can generally tolerate the small intrusions of the BDM
inserting instructions into the pipeline with minimal effect on real-time operation.
The debug module provides three types of breakpoints: PC with mask, operand address
range, and data with mask. These breakpoints can be configured into one- or two-level
triggers with the exact trigger response also programmable. The debug module
programming model can be written from either the external development system using the
debug serial interface or from the processor’s supervisor programming model using the
WDEBUG instruction. Only CSR is readable using the external development system.
11.6.1 Theory of Operation
Breakpoint hardware can be configured through TDR[TCR] to respond to triggers by
displaying PSTDDATA, initiating a processor halt, or generating a debug interrupt. As
shown in Table 11-27, when a breakpoint is triggered, an indication (CSR[BSTAT]) is
provided on the PSTDDATA output port of the DDATA information when it is not
displaying captured processor status, operands, or branch addresses. See Section 11.3.2,
“Processor Stopped or Breakpoint State Change (PST = 0xE).”
Table 11-27. PSTDDATA Nibble/CSR[BSTAT] Breakpoint Response
PSTDDATA Nibble/CSR[BSTAT] 1
1
Breakpoint Status
0000/0000
No breakpoints enabled
0010/0001
Waiting for level-1 breakpoint
0100/0010
Level-1 breakpoint triggered
1010/0101
Waiting for level-2 breakpoint
1100/0110
Level-2 breakpoint triggered
Encodings not shown are reserved for future use.
The breakpoint status is also posted in CSR. Note that CSR[BSTAT] is cleared by a CSR
read when either a level-2 breakpoint is triggered or a level-1 breakpoint is triggered and a
level-2 breakpoint is not enabled. Status is also cleared by writing to either TDR or XTDR
to disable trigger options.
BDM instructions use the appropriate registers to load and configure breakpoints. As the
system operates, a breakpoint trigger generates the response defined in TDR.
PC breakpoints are treated in a precise manner: exception recognition and processing are
initiated before the excepting instruction is executed. All other breakpoint events are
recognized on the processor’s local bus, but are made pending to the processor and
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-55
Freescale Semiconductor, Inc.
Real-Time Debug Support
sampled like other interrupt conditions. As a result, these interrupts are said to be
imprecise.
Freescale Semiconductor, Inc...
In systems that tolerate the processor being halted, a BDM-entry can be used. With
TDR[TRC] = 01, a breakpoint trigger causes the core to halt (PST = 0xF).
If the processor core cannot be halted, the debug interrupt can be used. With this
configuration, TDR[TRC] = 10, the breakpoint trigger becomes a debug interrupt to the
processor, which is treated higher than the nonmaskable level-7 interrupt request. As with
all interrupts, it is made pending until the processor reaches a sample point, which occurs
once per instruction. Again, the hardware forces the PC breakpoint to occur before the
targeted instruction executes and is precise. This is possible because the PC breakpoint is
enabled when interrupt sampling occurs. For address and data breakpoints, reporting is
considered imprecise because several instructions may execute after the triggering address
or data is detected.
As soon as the debug interrupt is recognized, the processor aborts execution and initiates
exception processing. This event is signaled externally by the assertion of a unique PST
value (PST = 0xD) for multiple cycles. The core enters emulator mode when exception
processing begins. After the standard 8-byte exception stack is created, the processor
fetches a unique exception vector from the vector table. Table 11-28 describes the two
unique entries that distinguish PC breakpoints from other trigger events.
Table 11-28. Exception Vector Assignments
Vector Number
Vector Offset (Hex)
Stacked Program Counter
Assignment
12
0x030
Next
Non-PC-breakpoint debug interrupt
13
0x034
Next
PC-breakpoint debug interrupt
(Refer to the ColdFire Programmer’s Reference Manual.)
In the case of a two-level trigger, the last breakpoint event determines the exception
vector; however, if the second-level trigger is PC || Address {&& Data} (as shown in the
last condition in the code example in Section 11.4.9.1, “Resulting Set of Possible Trigger
Combinations”), the vector taken is determined by the first condition that occurs after the
first-level trigger: vector 13 if PC occurs first or vector 12 if Address {&& Data} occurs
first. If both occur simultaneously, the non-PC-breakpoint debug interrupt is taken (vector
number 12).
Execution continues at the instruction address in the vector corresponding to the
breakpoint triggered. The debug interrupt handler can use supervisor instructions to save
the necessary context such as the state of all program-visible registers into a reserved
memory area.
During a debug interrupt service routine, all normal interrupt requests are evaluated and
sampled once per instruction. If any exception occurs, the processor responds as follows:
11-56
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Real-Time Debug Support
1. It saves a copy of the current value of the emulator mode state bit and then exits
emulator mode by clearing the actual state.
2. Bit 1 of the fault status field (FS1) in the next exception stack frame is set to
indicate the processor was in emulator mode when the interrupt occurred. This
corresponds to bit 17 of the longword at the top of the system stack. See
Section 7.3, “Exception Stack Frame Definition.”
3. It passed control to the appropriate exception handler.
4. It executes an RTE instruction when the exception handler finishes. During the
processing of the RTE, FS1 is reloaded from the system stack. If this bit is set, the
processor sets the emulator mode state and resumes execution of the original debug
interrupt service routine. This is signaled externally by the generation of the PST
value that originally identified the debug interrupt exception, that is, PST = 0xD.
Fault status encodings are listed in Table 10-2. Implementation of this debug interrupt
handling fully supports the servicing of a number of normal interrupt requests during a
debug interrupt service routine.
The emulator mode state bit is essentially changed to be a program-visible value, stored
into memory during exception stack frame creation, and loaded from memory by the RTE
instruction.
When debug interrupt operations complete, the RTE instruction executes and the
processor exits emulator mode. After the debug interrupt handler completes execution, the
external development system can use BDM commands to read the reserved memory
locations.
In Revision A, if a hardware breakpoint such as a PC trigger is left unmodified by the
debug interrupt service routine, another debug interrupt is generated after the completion
of the RTE instruction. In Revisions B and C, the generation of another debug interrupt
during the first instruction after the RTE exits emulator mode is inhibited. This behavior is
consistent with the existing logic involving trace mode where the first instruction executes
before another trace exception is generated. Thus, all hardware breakpoints are disabled
until the first instruction after the RTE completes execution, regardless of the programmed
trigger response.
11.6.1.1 Emulator Mode
Emulator mode is used to facilitate nonintrusive emulator functionality. This mode can be
entered in three different ways:
•
•
Setting CSR[EMU] forces the processor into emulator mode. EMU is examined
only if RSTI is negated and the processor begins reset exception processing. It can
be set while the processor is halted before reset exception processing begins. See
Section 11.5.1, “CPU Halt.”
A debug interrupt always puts the processor in emulation mode when debug
interrupt exception processing begins.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-57
Freescale Semiconductor, Inc.
Debug C Definition of PSTDDATA Outputs
•
Setting CSR[TRC] forces the processor into emulation mode when trace exception
processing begins.
While operating in emulation mode, the processor exhibits the following properties:
•
Freescale Semiconductor, Inc...
•
Unmasked interrupt requests are serviced. The resulting interrupt exception stack
frame has FS[1] set to indicate the interrupt occurred while in emulator mode.
If CSR[MAP] = 1, all caching of memory and the SRAM module are disabled. All
memory accesses are forced into a specially mapped address space signaled by
TT = 0x2, TM = 0x5 or 0x6. This includes stack frame writes and the vector fetch
for the exception that forced entry into this mode.
The RTE instruction exits emulation mode. The processor status output port provides a
unique encoding for emulator mode entry (0xD) and exit (0x7).
11.6.2 Concurrent BDM and Processor Operation
The debug module supports concurrent operation of both the processor and most BDM
commands. BDM commands may be executed while the processor is running, except the
following:
•
•
Read/write address and data registers
Read/write control registers
For BDM commands that access memory, the debug module requests the processor’s local
bus. The processor responds by stalling the instruction fetch pipeline and waiting for
current bus activity to complete before freeing the local bus for the debug module to
perform its access. After the debug module bus cycle, the processor reclaims the bus.
NOTE:
Breakpoint registers must be carefully configured in a
development system if the processor is executing. The debug
module contains no hardware interlocks, so TDR and XTDR
should be disabled while breakpoint registers are loaded, after
which TDR and XTDR can be written to define the exact
trigger. This prevents spurious breakpoint triggers.
Because there are no hardware interlocks in the debug unit, no BDM operations are
allowed while the CPU is writing the debug’s registers (DSCLK must be inactive).
11.7 Debug C Definition of PSTDDATA Outputs
This section specifies the ColdFire processor and debug module’s generation of the
PSTDDATA output on an instruction basis. In general, the PSTDDATA output for an
instruction is defined as follows:
PSTDDATA = 0x1, {[0x89B], operand}
11-58
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Debug C Definition of PSTDDATA Outputs
where the {...} definition is optional operand information defined by the setting of the
CSR.
The CSR provides capabilities to display operands based on reference type (read, write, or
both). A PST value {0x8, 0x9, or 0xB} identifies the size and presence of valid data to
follow on the PSTDDATA output {1, 2, or 4 bytes}. Additionally, for certain
change-of-flow branch instructions, CSR[BTB] provides the capability to display the
target instruction address on the PSTDDATA output {2, 3, or 4 bytes} using a PST value
of {0x9, 0xA, or 0xB}.
Freescale Semiconductor, Inc...
11.7.1 User Instruction Set
Table 11-29 shows the PSTDDATA specification for user-mode instructions. Rn
represents any {Dn, An} register. In this definition, the ‘y’ suffix generally denotes the
source and ‘x’ denotes the destination operand. For a given instruction, the optional
operand data is displayed only for those effective addresses referencing memory.
Table 11-29. PSTDDATA Specification for User-Mode Instructions
Instruction
Operand Syntax
PSTDDATA
add.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
add.l
Dy,<ea>x
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
adda.l
<ea>y,Ax
PSTDDATA = 0x1,{0xB, source operand}
addi.l
#<data>,Dx
PSTDDATA = 0x1
addq.l
#<data>,<ea>x
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
addx.l
Dy,Dx
PSTDDATA = 0x1
and.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
and.l
Dy,<ea>x
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
andi.l
#<data>,Dx
PSTDDATA = 0x1
asl.l
{Dy,#<data>},Dx
PSTDDATA = 0x1
asr.l
{Dy,#<data>},Dx
PSTDDATA = 0x1
bcc.{b,w,l}
if taken, then PSTDDATA = 0x5, else PSTDDATA = 0x1
bchg.{b,l}
#<data>,<ea>x
PSTDDATA = 0x1,{0x8, source},{0x8, destination}
bchg.{b,l}
Dy,<ea>x
PSTDDATA = 0x1,{0x8, source},{0x8, destination}
bclr.{b,l}
#<data>,<ea>x
PSTDDATA = 0x1,{0x8, source},{0x8, destination}
bclr.{b,l}
Dy,<ea>x
PSTDDATA = 0x1,{0x8, source},{0x8, destination}
bra.{b,w,l}
PSTDDATA = 0x5
bset.{b,l}
#<data>,<ea>x
PSTDDATA = 0x1,{0x8, source},{0x8, destination}
bset.{b,l}
Dy,<ea>x
PSTDDATA = 0x1,{0x8, source},{0x8, destination}
bsr.{b,w,l}
PSTDDATA = 0x5,{0xB, destination operand}
btst.{b,l}
#<data>,<ea>x
PSTDDATA = 0x1,{0x8, source operand}
btst.{b,l}
Dy,<ea>x
PSTDDATA = 0x1,{0x8, source operand}
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-59
Freescale Semiconductor, Inc.
Debug C Definition of PSTDDATA Outputs
Table 11-29. PSTDDATA Specification for User-Mode Instructions (Continued)
Freescale Semiconductor, Inc...
Instruction
Operand Syntax
PSTDDATA
clr.b
<ea>x
PSTDDATA = 0x1,{0x8, destination operand}
clr.l
<ea>x
PSTDDATA = 0x1,{0xB, destination operand}
clr.w
<ea>x
PSTDDATA = 0x1,{0x9, destination operand}
cmp.b
<ea>y,Dx
PSTDDATA = 0x1, {0x8, source operand}
cmp.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
cmp.w
<ea>y,Dx
PSTDDATA = 0x1, {0x9, source operand}
cmpa.l
<ea>y,Ax
PSTDDATA = 0x1,{0xB, source operand}
cmpa.w
<ea>y,Ax
PSTDDATA = 0x1, {0x9, source operand}
cmpi.b
#<data>,Dx
PSTDDATA = 0x1
cmpi.l
#<data>,Dx
PSTDDATA = 0x1
cmpi.w
#<data>,Dx
PSTDDATA = 0x1
divs.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
divs.w
<ea>y,Dx
PSTDDATA = 0x1,{0x9, source operand}
divu.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
divu.w
<ea>y,Dx
PSTDDATA = 0x1,{0x9, source operand}
eor.l
Dy,<ea>x
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
eori.l
#<data>,Dx
PSTDDATA = 0x1
ext.l
Dx
PSTDDATA = 0x1
ext.w
Dx
PSTDDATA = 0x1
extb.l
Dx
PSTDDATA = 0x1
PSTDDATA = 0x11
illegal
jmp
<ea>y
PSTDDATA = 0x5, {[0x9AB], target address} 2
jsr
<ea>y
PSTDDATA = 0x5, {[0x9AB], target address},{0xB , destination operand}2
lea.l
<ea>y,Ax
PSTDDATA = 0x1
link.w
Ay,#<displacement> PSTDDATA = 0x1,{0xB, destination operand}
lsl.l
{Dy,#<data>},Dx
PSTDDATA = 0x1
lsr.l
{Dy,#<data>},Dx
PSTDDATA = 0x1
mov3q.l
#<data>,<ea>x
PSTDDATA = 0x1, {0xB, destination operand}
move.b
<ea>y,<ea>x
PSTDDATA = 0x1,{0x8, source},{0x8, destination}
move.l
<ea>y,<ea>x
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
move.w
<ea>y,<ea>x
PSTDDATA = 0x1,{0x9, source},{0x9, destination}
move.w
CCR,Dx
PSTDDATA = 0x1
move.w
{Dy,#<data>},CCR
PSTDDATA = 0x1
movea.l
<ea>y,Ax
PSTDDATA = 0x1,{0xB, source}
movea.w
<ea>y,Ax
PSTDDATA = 0x1,{0x9, source}
movem.l
#list,<ea>x
PSTDDATA = 0x1,{0xB, destination},... 3
11-60
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Debug C Definition of PSTDDATA Outputs
Table 11-29. PSTDDATA Specification for User-Mode Instructions (Continued)
Freescale Semiconductor, Inc...
Instruction
Operand Syntax
PSTDDATA
movem.l
<ea>y,#list
PSTDDATA = 0x1,{0xB, source},... 3
moveq.l
#<data>,Dx
PSTDDATA = 0x1
muls.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
muls.w
<ea>y,Dx
PSTDDATA = 0x1,{0x9, source operand}
mulu.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
mulu.w
<ea>y,Dx
PSTDDATA = 0x1,{0x9, source operand}
mvs.b
<ea>y,Dx
PSTDDATA = 0x1, {0x8, source operand}
mvs.w
<ea>y,Dx
PSTDDATA = 0x1, {0x9, source operand}
mvz.b
<ea>y,Dx
PSTDDATA = 0x1, {0x8, source operand}
mvz.w
<ea>y,Dx
PSTDDATA = 0x1, {0x9, source operand}
neg.l
Dx
PSTDDATA = 0x1
negx.l
Dx
PSTDDATA = 0x1
nop
PSTDDATA = 0x1
not.l
Dx
PSTDDATA = 0x1
or.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
or.l
Dy,<ea>x
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
ori.l
#<data>,Dx
PSTDDATA = 0x1
pea.l
<ea>y
PSTDDATA = 0x1,{0xB, destination operand}
pulse
PSTDDATA = 0x4
rems.l
<ea>y,Dw:Dx
PSTDDATA = 0x1,{0xB, source operand}
remu.l
<ea>y,Dw:Dx
PSTDDATA = 0x1,{0xB, source operand}
rts
PSTDDATA = 0x1, PSTDDATA = 0x5, {[0x9AB], target address}
sats.l
Dx
PSTDDATA = 0x1
scc.b
Dx
PSTDDATA = 0x1
sub.l
<ea>y,Dx
PSTDDATA = 0x1,{0xB, source operand}
sub.l
Dy,<ea>x
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
suba.l
<ea>y,Ax
PSTDDATA = 0x1,{0xB, source operand}
subi.l
#<data>,Dx
PSTDDATA = 0x1
subq.l
#<data>,<ea>x
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
subx.l
Dy,Dx
PSTDDATA = 0x1
swap.w
Dx
PSTDDATA = 0x1
tas.b
<ea>x
PSTDDATA = 0x1, {0x8, source}, {0x8, destination}
tpf
PST = 0x1
tpf.l
#<data>
PST = 0x1
tpf.w
#<data>
PST = 0x1
trap
#<data>
PSTDDATA = 0x11
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-61
Freescale Semiconductor, Inc.
Debug C Definition of PSTDDATA Outputs
Table 11-29. PSTDDATA Specification for User-Mode Instructions (Continued)
Instruction
Operand Syntax
PSTDDATA
tst.b
<ea>x
PSTDDATA = 0x1,{0x8, source operand}
tst.l
<ea>y
PSTDDATA = 0x1,{0xB, source operand}
tst.w
<ea>y
PSTDDATA = 0x1,{0x9, source operand}
unlk
Ax
PSTDDATA = 0x1,{0xB, destination operand}
wddata.b
<ea>y
PSTDDATA = 0x4, {0x8, source operand
wddata.l
<ea>y
PSTDDATA = 0x4, {0xB, source operand
wddata.w
<ea>y
PSTDDATA = 0x4, {0x9, source operand
Freescale Semiconductor, Inc...
1
During normal exception processing, the PSTDDATA output is driven to a 0xC indicating the exception
processing state. The exception stack write operands, as well as the vector read and target address of
the exception handler may also be displayed.
Exception Processing
PSTDDATA = 0xC, {0xB,destination}, // stack frame
{0xB,destination}, // stack frame
{0xB,source},
// vector read
PSTDDATA = 0x5, {[0x9AB],target}
// handler PC
The PSTDDATA specification for the reset exception is shown below:
Exception Processing
PSTDDATA = 0xC,
PSTDDATA = 0x5, {[0x9AB],target}
// handler PC
The initial references at address 0 and 4 are never captured nor displayed since these accesses are
treated as instruction fetches.
For all types of exception processing, the PSTDDATA = 0xC value is driven at all times, unless the
PSTDDATA output is needed for one of the optional marker values or for the taken branch indicator
(0x5).
2 For JMP and JSR instructions, the optional target instruction address is displayed only for those
effective address fields defining variant addressing modes. This includes the following <ea>x values:
(An), (d16,An), (d8,An,Xi), (d8,PC,Xi).
3 For Move Multiple instructions (MOVEM), the processor automatically generates line-sized transfers if
the operand address reaches a 0-modulo-16 boundary and there are four or more registers to be
transferred. For these line-sized transfers, the operand data is never captured nor displayed, regardless
of the CSR value.
The automatic line-sized burst transfers are provided to maximize performance during these sequential
memory access operations.
Table 11-30 shows the PSTDDATA specification for multiply-accumulate instructions.
Table 11-30. PSTDDATA Values for User-Mode Multiply-Accumulate Instructions
Instruction
11-62
Operand Syntax
PSTDDATA
mac.l
Ry,Rx
PSTDDATA = 0x1
mac.l
Ry,Rx,<ea>y,Rw,ACCx
PSTDDATA = 0x1,{0xB, source operand}
mac.l
Ry,Rx,ACCx
PSTDDATA = 0x1
mac.l
Ry,Rx,ea,Rw
PSTDDATA = 0x1,{0xB, source operand}
mac.w
Ry,Rx
PSTDDATA = 0x1
mac.w
Ry,Rx,<ea>y,Rw,ACCx
PSTDDATA = 0x1,{0xB, source operand}
mac.w
Ry,Rx,ACCx
PSTDDATA = 0x1
mac.w
Ry,Rx,ea,Rw
PSTDDATA = 0x1,{0xB, source operand}
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Debug C Definition of PSTDDATA Outputs
Table 11-30. PSTDDATA Values for User-Mode Multiply-Accumulate Instructions
Freescale Semiconductor, Inc...
Instruction
Operand Syntax
PSTDDATA
move.l
{Ry,#<data>},ACCext01
PSTDDATA = 0x1
move.l
{Ry,#<data>},ACCext23
PSTDDATA = 0x1
move.l
{Ry,#<data>},ACCx
PSTDDATA = 0x1
move.l
{Ry,#<data>},MACSR
PSTDDATA = 0x1
move.l
{Ry,#<data>},MASK
PSTDDATA = 0x1
move.l
ACCext01,Rx
PSTDDATA = 0x1
move.l
ACCext23,Rx
PSTDDATA = 0x1
move.l
ACCy,ACCx
PSTDDATA = 0x1
move.l
ACCy,Rx
PSTDDATA = 0x1
move.l
MACSR,CCR
PSTDDATA = 0x1
move.l
MACSR,Rx
PSTDDATA = 0x1
move.l
MASK,Rx
PSTDDATA = 0x1
msac.l
Ry,Rx
PSTDDATA = 0x1
msac.l
Ry,Rx,<ea>y,Rw,ACCx
PSTDDATA = 0x1,{0xB, source operand}
msac.l
Ry,Rx,ACCx
PSTDDATA = 0x1
msac.l
Ry,Rx,<ea>y,Rw
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
msac.w
Ry,Rx
PSTDDATA = 0x1
msac.w
Ry,Rx,<ea>y,Rw,ACCx
PSTDDATA = 0x1,{0xB, source operand}
msac.w
Ry,Rx,ACCx
PSTDDATA = 0x1
msac.w
Ry,Rx,<ea>y,Rw
PSTDDATA = 0x1,{0xB, source},{0xB, destination}
Table 11-31 shows the PSTDDATA specification for floating-point instructions; note that
<ea>y includes FPy, Dy, Ay, and <mem>y addressing modes. The optional operand
capture and display applies only to the <mem>y addressing modes. Note also that the
PSTDDATA values are the same for a given instruction, regardless of explicit rounding
precision.
Table 11-31. PSTDDATA Values for User-Mode Floating-Point Instructions
Instruction 1
Operand Syntax
PSTDDATA
fabs.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fadd.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fbcc.{w,l}
<label>
if taken, then PSTDDATA = 5, else PSTDDATA = 0x1
fcmp.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fdiv.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fint.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fintrz.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-63
Freescale Semiconductor, Inc.
Debug C Definition of PSTDDATA Outputs
Table 11-31. PSTDDATA Values for User-Mode Floating-Point Instructions
Freescale Semiconductor, Inc...
Instruction 1
Operand Syntax
PSTDDATA
fmove.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fmove.sz
FPy,<ea>x
PSTDDATA = 0x1, [89B], destination}
fmove.l
<ea>y,FP*R
PSTDDATA = 0x1, B, source}
fmove.l
FP*R,<ea>x
PSTDDATA = 0x1, B, destination}
fmovem
<ea>y,#list
PSTDDATA = 0x1
fmovem
#list,<ea>x
PSTDDATA = 0x1
fmul.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fneg.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fnop
PSTDDATA = 0x1
fsqrt.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
fsub.sz
<ea>y,FPx
PSTDDATA = 0x1, [89B], source}
ftst.sz
<ea>y
PSTDDATA = 0x1, [89B], source}
1
The FP*R notation refers to the floating-point control registers: FPCR, FPSR, and FPIAR.
Depending on the size of any external memory operand specified by the f<op>.fmt field,
the data marker is defined as shown in Table 11-32.
Table 11-32. Data Markers and FPU Operand Format Specifiers
Format Specifier
Data Marker
.b
8
.w
9
.l
B
.s
B
.d
Never captured
11.7.2 Supervisor Instruction Set
The supervisor instruction set has complete access to the user mode instructions plus the
opcodes shown below. The PSTDDATA specification for these opcodes is shown in
Table 11-33.
Table 11-33. PSTDDATA Specification for Supervisor-Mode Instructions
Instruction
Operand Syntax
PSTDDATA
cpushl
dc,(Ax)
ic,(Ax)
bc,(Ax)
PSTDDATA = 0x1
frestore
<ea>y
PSTDDATA = 0x1
fsave
<ea>x
PSTDDATA = 0x1
11-64
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ColdFire Debug History
Table 11-33. PSTDDATA Specification for Supervisor-Mode Instructions
Instruction
Operand Syntax
Freescale Semiconductor, Inc...
halt
PSTDDATA
PSTDDATA = 0x1,
PSTDDATA = 0xF
intouch
(Ay)
PSTDDATA = 0x1
move.l
Ay,USP
PSTDDATA = 0x1
move.l
USP,Ax
PSTDDATA = 0x1
move.w
SR,Dx
PSTDDATA = 0x1
move.w
{Dy,#<data>},SR
PSTDDATA = 0x1, {0x3}
movec.l
Ry,Rc
PSTDDATA = 0x1, {8, ASID}
rte
PSTDDATA = 0x7, {0xB, source operand}, {3},{0xB, source operand}, {DD},
PSTDDATA = 0x5, {[0x9AB], target address}
stop
#<data>
PSTDDATA = 0x1,
PSTDDATA = 0xE
wdebug.l
<ea>y
PSTDDATA = 0x1, {0xB, source, 0xB, source}
The move-to-SR and RTE instructions include an optional PSTDDATA = 0x3 value,
indicating an entry into user mode. Additionally, if the execution of a RTE instruction
returns the processor to emulator mode, a multiple-cycle status of 0xD is signaled.
Similar to the exception processing mode, the stopped state (PSTDDATA = 0xE) and the
halted state (PSTDDATA = 0xF) display this status throughout the entire time the
ColdFire processor is in the given mode.
11.8 ColdFire Debug History
This section describes the origins of the ColdFire debug systems.
11.8.1 ColdFire Debug Classic: The Original Definition
The original design, Revision A, provided debug support in three separate areas:
•
•
•
Real-time trace.
Background debug mode (BDM)
Real-time debug
The real-time debug features may be accessed from the external BDM emulator or from
the supervisor programming model of the processor. The hardware breakpoint registers
include: a PC breakpoint + mask, two address registers for defining a specific address or a
range of addresses, and a data breakpoint + mask. The original design supported
breakpoints of the form:
if PC_breakpoint is triggered
then respond using user-defined configuration
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-65
Freescale Semiconductor, Inc.
ColdFire Debug History
if Address_breakpoint {&& Data_breakpoint} is triggered
then respond using user-defined configuration
Two-level triggers of the form:
if PC_breakpoint is triggered
then if Address_breakpoint {&& Data_breakpoint} is triggered
then respond using user-defined configuration
if Address_breakpoint {&& Data_breakpoint} is triggered
then if PC_breakpoint is triggered
then respond using user-defined configuration
Freescale Semiconductor, Inc...
The data_breakpoint can be included as an optional part of an address breakpoint.
The ColdFire debug architecture was created to provide this set of functionality without
requiring the traditional connection to the external system bus. Rather, the functionality is
provided using only a connection to a Motorola-defined 26-pin debug connector. By
providing the required debug signals in customer-specific designs, standard third-party
emulators can be used for debug of these designs.
NOTE:
The baseline debug functionality is described in any of the
ColdFire MCF52xx User’s Manuals, which are available as
PDF files at: http://www.motorola.com/ColdFire/. As an
example, see the debug section of the MCF5272 User’s
Manual located under MCF5272 Product Information.
Implementation of the original debug module produced design requiring approximately
34,000 transistors, 22,500 for the BDM/real-time debug function, 7,500 for the DDATA
module, and 4,000 for the PC breakpoint logic in the processor.
11.8.2 ColdFire Debug Revision B
During development of the Version 3 ColdFire design, there were a number of
enhancements to the original debug functionality requested by customers and third-party
developers. These requests resulted in an expanded set of debug functionality named
Revision B.
The Rev. B enhancements are as follows:
•
•
•
•
11-66
Addition of a BDM SYNC_PC command to display the processor’s current PC
Creation of more flexible hardware breakpoint triggers, i.e., support for “OR”
combinations
Removal of the restrictions involving concurrent hardware breakpoint use and
BDM command activity
Redefinition of the processor status values for the RTS instruction
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
ColdFire Debug History
•
•
•
An external mechanism to generate a debug interrupt
A mechanism to inhibit debug interrupts after the RTE exit
A mechanism to identify the revision level of the debug module
Rev. B enhancements provide backward compatibility with the original design.
11.8.3 ColdFire Debug Revision C
Freescale Semiconductor, Inc...
Continuing discussions with customers and the developer community led to Revision C
design enhancements primarily related to improvements in the real-time debug
capabilities of the ColdFire architecture. The remainder of this section details these
enhancements.
11.8.3.1 Debug Interrupts and Interrupt Requests (Emulator Mode)
In Rev. A and Rev. B ColdFire debug implementations, the response to a user-defined
breakpoint trigger can be configured to be one of three possibilities:
•
•
•
The breakpoint trigger can merely be displayed on the DDATA bus, with no
internal reaction to the trigger. The trigger state information is displayed on
DDATA in all situations.
The breakpoint trigger can force the processor to halt and allow BDM activities.
The breakpoint trigger can generate a special debug interrupt to allow real-time
systems to quickly process the interrupt and return to normal system executing as
rapidly as possible.
The occurrence of the debug interrupt exception is treated as a special type of interrupt. It
is considered to be higher in priority than all normal interrupt requests and has special
processor status values to provide an external indication that this interrupt has occurred.
Additionally, the execution of the debug interrupt service routine is forced to be
interrupt-inhibited by the processor hardware with an optional capability to map all
instruction and operand references while in this service routine into a separate address
space so that an emulator could define the routine dynamically. The current processor
implementations actually include a program-invisible state bit which defines this emulator
mode of operation. Also note, the interrupt mask level is not modified during the
processing of a debug interrupt.
Customers with real-time embedded systems have specifically asked for the ability to
service normal interrupt requests while processing the debug interrupt service routine. In
many systems of this type, motion-based servo interrupts must be considered as the
highest priority interrupt request.
To provide this functionality and be able to service any number of normal interrupt
requests (including the possibility of nested interrupts), the processor state signaling
emulator mode must be included as part of the exception stack frame.
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-67
Freescale Semiconductor, Inc.
Motorola-Recommended BDM Pinout
Freescale Semiconductor, Inc...
As part of the Rev. C functionality, the operation of the debug interrupt is modified in the
following manner:
1. The occurrence of the breakpoint trigger, configured to generate a debug interrupt,
is treated exactly as before. The debug interrupt is treated as a higher priority
exception relative to the normal interrupt requests encoded on the interrupt priority
input signals.
2. At the appropriate sample point, the ColdFire processor initiates debug interrupt
exception processing. This event is signaled externally by the generation of a
unique PST value (PST = 0xD) asserted for multiple cycles. The processor sets the
emulator mode state bit as part of this exception processing.
3. While the processor in the debug interrupt service routine, all normal interrupt
requests are evaluated and sampled once per instruction. While in this routine, if
any type of exception occurs, the processor responds in the following manner:
a) In response to the new exception, the processor saves a copy of the current value
of the emulator mode state bit and then exits emulator mode by clearing the
actual state.
b) The new exception stack frame sets bit 1 of the fault status field, using the saved
emulator mode bit, indicating execution while in emulator mode has been
interrupted. This corresponds to bit 17 of the longword at the top of the system
stack.
c) Control is passed to the appropriate exception handler.
d) When the exception handler is complete, a Return From Exception (RTE)
instruction is executed. During the processing of the RTE, FS[1] is reloaded
from the system stack. If this bit is asserted, the processor sets the emulator
mode state and resumes execution of the original debug interrupt service
routine. This is signaled externally by the generation of the PST value that
originally identified the occurrence of a debug interrupt exception, that is, PST
= 0xD.
Implementation of this revised debug interrupt handling fully supports the servicing of any
number of normal interrupt requests while in a debug interrupt service routine. The
emulator mode state bit is essentially changed to be a program-visible value, stored into
memory during exception stack frame creation and loaded from memory by the RTE
instruction.
11.9 Motorola-Recommended BDM Pinout
The ColdFire BDM connector, Figure 11-1, is a 26-pin Berg connector arranged 2 x 13.
11-68
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Motorola-Recommended BDM Pinout
Developer reserved 1
1
2
BKPT
GND
3
4
DSCLK
GND
5
6
Developer reserved1
RESET
7
8
DSI
VDD_IO 2
9
10
DSO
GND
11
12
PSTDDATA7
PSTDDATA6
13
14
PSTDDATA5
PSTDDATA4
15
16
PSTDDATA3
PSTDDATA2
17
18
PSTDDATA1
PSTDDATA0
19
20
GND
Motorola reserved
21
22
Motorola reserved
GND
23
24
PSTCLK
VDD_CPU
25
26
TA
1 Pins reserved for BDM
2 Supplied by target.
developer use.
Figure 11-1. Recommended BDM Connector
Chapter 11. Debug Support
For More Information On This Product,
Go to: www.freescale.com
11-69
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Motorola-Recommended BDM Pinout
11-70
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Chapter 12
Test
This chapter provides an overview of test features of CF4e. Some of the features, such as
MBist hardware, are included in the CF4e design. The scan and wrapper methodology,
described later in the chapter, are part of the CF4e design but are described here as a
reference for properly designing CF4e for test.
Because it is less accessible, an embedded core device is more difficult to test. The solution
to applying the test to an embedded core should not be less efficient, should not lessen the
quality, and should not significantly increase integration costs by the need for extra signals,
routes, and logic. High quality levels and efficient test vectors can result from using the
proper mix of test techniques to address the embedded market.
Testing the structure of general combinational and sequential logic in the core is typically
done by application of vectors measured against the stuck-at fault model. The test vectors
applied may be functional or scan based; however, full-scan testing is the most efficient test
architecture for generating and applying stuck-at vectors. A further optimization to a
full-scan test architecture that allows the reduction of the shift cost (number of clock cycles
required to load in a state) associated with the scan architecture is the support of a
parallel-pin architecture with multiple, simultaneously operational scan chains.
Testing memory arrays in an embedded core is most efficiently done with a memory
built-in-self test (MBIST) architecture. One major goal is to reduce the overall test time by
testing each embedded memory simultaneously with at-speed data transfers, reads, and
writes. Another goal is to reduce the signal interface involved in testing multiple embedded
memories by requiring only the invoke, done, and fail indicators (as a minimum).
All logic internal to the boundary of an embedded core can be tested efficiently with support
from test selection, scan, and MBIST architectures. These architectures, in conjunction
with tester pauses and current measurement techniques, allow all test and DFT goals to be
met. In addition, the optional test wrapper scan architecture at the embedded core hierarchy
boundary permits testing beyond the embedded core.
The embedded CF4e is designed to be tested independently of the rest of the chip in all test
modes. It also allows noncore logic to be tested up to the core interface. No reliance on
internal core logic or specific core test modes should be required to test non-core logic.
Considerations are taken to ensure that the CF4e can be put in the functional (nontest) mode
with no residual effect or interference from the test logic.
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-1
Freescale Semiconductor, Inc.
Scan Chains
We recommend using full-scan, multiplexed, D flip-flop methodology on the CF4e
implementation.
The CF4e core has one clock domain. The core has an optional test boundary (or test
wrapper) comprised of wrapper cells used to access all functional input and output ports.
The core has a BIST controller to BIST-test memory arrays attached to the core. The
processor-local memories are external to the CF4e design to allow the system designer to
configure and size those memories for a given application.
Freescale Semiconductor, Inc...
12.1 Scan Chains
This section describes the core and wrapper scan chains.
12.1.1 Core Scan Chains
Customers can choose the number of scan chains in the CF4e. Table 12-1 describes the test
ports of these scan chains.
Table 12-1. CF4e Core Scan Chains
1
Scan Inputs
Scan Outputs
Scan Enable
Clock
si[N–1:0] 1
so[N-1:0] 1
se
clkfast
N represents the number of core scan chains
12.1.2 Wrapper Scan Chains
Because scan is part of the implementation phase, the soft CF4e does not contain the test
wrapper in its RTL code when it is delivered to customers. However, a test wrapper can be
optionally created using synthesis scripts. Table 12.2 gives a more detailed description of
the test wrapper. If the test wrapper is used, customers can choose the number of wrapper
scan chains. An input wrapper chain has cells connected only to functional input ports of
the core. An output wrapper chain has cells connected only to functional output ports of the
core. The CF4e has no I/O or three-state ports. There are no wrapper cells on the clock
(clkfast), memory, scan input, or scan output ports of the cores. Table 12-2 describes the
test ports of the wrapper scan chains.
Table 12-2. CF4e Wrapper Scan Chains
Signal Types
12-2
Port Names
Wrapper scan data inputs
tbsi[K-1:0] 1
Wrapper scan data outputs
tbso[K-1:0] 1
Input wrapper scan enable
tbsei
Output wrapper scan enable
tbseo
Clock for wrapper scan chains
clkfast
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Test Wrapper
1
K: Number of wrapper scan chains
12.1.3 Scan Chains Block Diagram
For a multiplexed D flip-flop scan chain DFT methodology, the core chains and the wrapper
chains as shown in Figure 12-1.
Scan Data Out
Freescale Semiconductor, Inc...
Core
Boundary
Input
to core
Input
to core
Output
from core
Input
Wrapper
Chain
Core
Chains
Output
Wrapper
Chain
Input
to core
Output
from core
Output
from core
Scan Data In
Figure 12-1. CF4e Scan Chains Block Diagram
12.2 Test Wrapper
As described in Section 12.1.2, “Wrapper Scan Chains,” because scan is part of the
implementation phase, the soft CF4e does not contain the test wrapper in its RTL code
when it is delivered to customers. However, a test wrapper can be optionally created in later
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-3
Freescale Semiconductor, Inc.
Test Wrapper
process during synthesis using synthesis scripts. This section provides information on the
test wrapper (abbreviated as CF4eTW), which is implemented using multiplexed, D
flip-flop DFT methodology and the shared wrapper type.
12.2.1 Features
Freescale Semiconductor, Inc...
The test wrapper, shown in Figure 12-2, allows the CF4e to be tested independently of the
rest of the system-on-a-chip (SoC) and allows noncore logic to be tested up to the CF4e
interface. An SoC includes standalone virtual components, integration logic, and
application platforms. SoC can go to tape out and run and become final product. The CF4e
is used as an embedded core in an SoC environment.
Input/Output
User-Defined
Logic
CF4e Test Wrapper
Input/
Output
Core
Chains
UserDefined
Logic
UserDefined
Logic
Input/
Output
User-Defined
Logic
Input/Output
Figure 12-2. CF4e and Test Wrapper in SoC
The test wrapper provides the following:
•
•
12-4
Support for reapplying original test vectors when the target CF4e is embedded into
an SoC
Ability to control and observe the CF4e port interface boundary
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Test Wrapper
•
•
•
•
Pattern generation for the interface between the CF4e within an SoC without
knowledge of the block functionality
Support for AC testing of the interface to and from the SoC interface logic (at-speed
transitions to be launched or captured by the test wrapper)
Safe state to the non-core logic while the CF4e is under test, and vice versa
Transparency when the test wrapper is not used. The test wrapper is logically
removed during functional mode.
Freescale Semiconductor, Inc...
12.2.2 Wrapper Cells
If a wrapper is created using multiplexed, D flip-flop DFT methodology, its scan chains are
mainly composed of shared wrapper cells (see Figure 12-3). A shared wrapper cell contains
a functional register and a D flip-flop called an independent shift bit (ISB) that allows a
second bit of data to be shifted into the core sequentially to perform path delay testing on
the core input and output paths. A shared wrapper cell can be used only on a registered input
or output. Nonregistered inputs or outputs must use partition cell (p-cell).
Scan Output
Functional
Input
Peripheral
Logic
D
SDI
To CF4e
Functional
Logic
Q
CLK
Core Input Cell
D
Q
ISB
CLK
Scan Input
Scan Output
From CF4e
Functional
Logic
D
SDI
CLK
Core Output Cell
D
Q
Peripheral
Logic
Functional
Output
Q
ISB
CLK
Figure 12-3. CF4e Core Shared Wrapper Cells
There are two nonregistered inputs on the CF4e core. These inputs require dedicated
wrapper partition cells (P cells) to control the two (see Figure 12-4). These P cells are
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-5
Freescale Semiconductor, Inc.
Test Wrapper
included in the input wrapper scan chain of the CF4e.
Scan Input
Peripheral
Logic
OUT
Scan Output
Wrapper Cell
functional path
CF4e
0
IN
1
ISB
Q
D
CK
D
Q
SDI
SE
CK
Freescale Semiconductor, Inc...
clkfast
Scan Enable clkfast
Test Enable tbte
Figure 12-4. CF4e Core Dedicated Input Wrapper Cell (P Cell)
The two P cells in the CF4e core wrapper have a functional path through the multiplexer.
The logic on either side of this wrapper cell is testable while the test enable is asserted.
However, to test the wire and possibly a critical timing path, the core and peripheral scan
chains must be loaded with the test enable (tbte) set to transparent mode (active low).
A synthesis script adds wrapper logic during synthesis. Chains must be evaluated for
correctness after script use.
12.2.3 Block Diagram
Figure 12-5 shows a shared wrapper architecture. This design has K wrapper scan chains
and N general core scan chains.
12-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Test Wrapper
CF4e
Hierarchy
Independent
Shift Bits to
Enable Logic
Transitions
Non-Core
Logic
D
Non-Core
Logic
Functional
Output
Q
From CF4e
Functional
Logic
ISB
CLK
Functional
Input
Freescale Semiconductor, Inc...
tbso[K-1:0]
D
D
SDI
To CF4e
Functional
Logic
Q
SDI
CLK
D
CLK
Q
ISB
Functional
Output
CLK
D
Q
From Core
Functional
Logic
ISB
CLK
Functional
Input
D
Test
Mode
Inputs
3
/
CLK
Q
ISB
tbsi[K-1:0] Where K: Number of wrapper chains
H
e
a
d
s
Q
SDI
D
CLK
S
c
a
n
D
To CF4e
Functional
Logic
Q
SDI
Scan
Inputs
N
/
si[N-1:0]
Q
To CF4e
Scan
Logic
CF4e
Test Controller
Unit
CLK
From CF4eCF4e
Scan
Logic
S
c
a
n
Scan
Outputs
N
/
so[N-1:0]
T
a
i
l
s
Functional Mode
Scan Modes
MBIST Modes
Figure 12-5. Example of Registered CF4eTW Architecture
The first and last registers in a scan chain (called the head and tail registers) are used in each
scan chain only for scan shift operations and have no functional use. They ensure a full
clock cycle to propagate data into the first nonhead flop and from the last nontail flop of the
scan chains. They also eliminate the need to consider routing delays from the CF4e
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-7
Freescale Semiconductor, Inc.
Test Wrapper
boundary to the first scan-in flop and from the last scan-out flop.
First
Nonhead
Flop
Head
Scan chain
Last
Nontail
Flop
Tail
Core Boundary
Routing delays
A full clock
A full clock
Routing delays
Figure 12-6. Scans and Flops
Freescale Semiconductor, Inc...
The ISB is described in Section 12.2.2, “Wrapper Cells.”
12.2.4 Timing
When it is embedded and integrated within a chip, the CF4eTW scan architecture tests the
following:
•
•
•
•
CF4e inputs for structure and timing
CF4e outputs for structure and timing
The interface between the CF4e output signals and the non-core logic input signals
for structure and timing
The interface between the non-core logic output signals and the CF4e input signals
for structure (and possibly timing)
These operations are described in the following sections.
12.2.4.1 CF4eTW Testing of CF4e Core Inputs
CF4eTW testing of CF4e inputs is a manufacturing test operation. Verification and testing
of CF4e inputs for structure and timing is done by applying vectors through the CF4eTW
scan architecture. Testing is accomplished by launching logic values into the CF4e from the
functional input registers in the CF4eTW scan chain.
The CF4e internal parallel scan chains must capture these launched values. This confirms
that the functional register operates and that connections from the functional register into
the CF4e are correct. If logic values launched into the core are configured as a vector pair
with logic transitions, the logic paths from the interface register into the CF4e are also
verified for timing with reference to the clock cycle. Figure 12-7 describes timing diagram
for an input wrapper to CF4e core scan stuck-at vector.
12-8
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Test Wrapper
Last Scan Shift In
First Scan Shift Out
Functional
Sample
clkfast
tbsei
Note: No Sample
Needs to be done
by the CF4eTW inputs
tbte
Freescale Semiconductor, Inc...
tbseo
Outputs to non-core
logic not considered
in this example
Establish
Fault Exercise
Values into
CF4e logic
registered
CF4eTW
Input
Fault
Exercise
Data
Data
Data
Data
Data
Fault
Data
Data
Data
se
Internal
CF4e Core
scan input
Register Setup
Time Point
for CF4e
internal scan
Capture Point
for Fault Effect
into CF4e Scan Cell
Figure 12-7. CF4eTW Input to CF4e Core Scan Stuck-At Vector Example
Figure 12-8 describes timing diagram for an input wrapper to CF4e core scan delay vector.
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-9
Freescale Semiconductor, Inc.
Test Wrapper
Last Scan Shift In
First Scan Shift Out
Functional
Sample
N-1 Scan Shift In
clkfast
tbsei
Note: No Sample
Needs to be done
Freescale Semiconductor, Inc...
by the CF4eTW inputs
tbte
tbseo
Outputs to non-core
logic not considered
in this example
Establish
logic transition
launched into
CF4e logic
registered
CF4eTW
Input
Logic
Transition
Data
Opposite
Transition
Data
Data
Data
Data
Data
Data
se
Internal
CF4e
scan input
Data
Register Setup
Time Point
for CF4e
internal scan
Capture
Path
Data
Capture Point
for Transition Effect
into CF4e functional
registers
Figure 12-8. CF4eTW Input to CF4e Core Scan Delay Vector Example
Delay testing first identifies a target path (usually with static timing analysis). It then creates
a vector that shifts a logic value into the functional register in the n-1 shift (next to last shift)
and then shifts in the opposite logic value into the functional register as the last shift. This
launches a transition into the CF4e core. Because the CF4e core and test wrapper share the
same system clock, the next system clock rising-edge after the last shift is the sample cycle,
12-10
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Test Wrapper
so the internal CF4e core scan enable signal must be negated to allow the CF4e core internal
scan architecture to capture the effect of the launched transition.
Note that the CF4eTW input side scan enable does not need to negate because this test is to
verify from the inputs into the core. Sampling done by input registers of the CF4e test
wrapper would verify outputs of the noncore logic. Similarly, for this example, the output
scan enable, tbseo, does not need to be negated (in reality, however, stuck-at testing of the
CF4e core uses the CF4eTW input registers and the CF4eTW output registers
simultaneously: path delay may be done individually).
Freescale Semiconductor, Inc...
When a hard-layout core is delivered, understanding this operation is unnecessary because
vectors exist to accomplish these operations (for example, the CF4e core manufacturing test
program).
12.2.4.2 CF4eTW Testing of CF4e Core Outputs
CF4eTW testing of CF4e core outputs is also considered a manufacturing test operation of
the CF4e core. Verification and testing of CF4e core outputs for structure and timing is the
application of vectors through the CF4e core internal parallel scan architecture to be
captured by the CF4eTW scan architecture. Testing is done by launching logic values from
the CF4e core internal parallel scan registers to be captured by the functional output
registers included in the CF4eTW scan chain.
Logic values launched by the CF4e core scan chains must be captured by the CF4eTW scan
chains. This verifies that the CF4eTW functional output register operates and that
connections from the CF4e core internal functional registers are correct. If logic values
launched from the CF4e core are configured as a vector pair with logic transitions, logic
paths from the CF4e core internals to the CF4eTW interface register are also verified for
timing with reference to the clock cycle.
Delay testing is done by first identifying a target path (usually with static timing analysis)
and then creating a vector that shifts a logic value into the internal functional register as the
last shift. The CF4e core scan enable is then negated, and the next system clock conducts a
functional state transition, which launches a logic transition into the CF4e core along the
identified path.
Because the CF4e core and the CF4eTW share the same system clock, the next system
clock rising-edge after the first sample is the path delay effect sample cycle. This requires
that the CF4eTW scan enable signal must be negated to allow the CF4eTW scan
architecture to capture the effect of the launched transition. Note that the CF4eTW input
side scan enable does not need to negate because this test is to verify from the internals to
functional outputs. Any sampling done by the input registers of the CF4eTW would be
verifying the outputs of the noncore logic. Figure 12-9 shows timing for a CF4e core to
output wrapper scan stuck-at vector.
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-11
Freescale Semiconductor, Inc.
Test Wrapper
Last Scan Shift In
First Scan Shift Out
Functional
Sample
clkfast
tbsei
Note: No Sample
Needs to be done
by the CF4eTW inputs
tbte
Freescale Semiconductor, Inc...
tbseo
Outputs to non-core
logic not considered
in this example
Establish
Fault Exercise
Values into
CF4e logic
Internal CF4e
scan output
Capture
of Fault
Effect
Fault
Exercise
Data
Data
Data
Data
Data
Fault
Data
Data
Data
se
CF4eTW
functional
output
register
Register Setup
Time Point
for CF4eTW
boundary scan
Capture Point
for Fault Effect
into CF4eTW output
register
Figure 12-9. CF4e Core to CF4eTW Output Scan Stuck-At Vector Example
Figure 12-10 shows timing for a CF4e core to output wrapper scan delay vector.
12-12
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Test Wrapper
Last Scan Shift In
1st Functional
Sample
2nd Functional First Scan Shift Out
Sample
clkfast
tbsei
Note: No Sample
Needs to be done
by the CF4eTW inputs
Freescale Semiconductor, Inc...
tbte
tbseo
Outputs to non-core
logic not considered
in this example
Establish
Fault Exercise
Values into
CF4e logic
registered
CF4eTW
Input
Original
State
Data
Transition
State
Data
Data
Data
Data
Path
Data
Data
se
output
CF4eTW
functional
register
Data
Register Setup
Time Point
for CF4eTW
boundary scan
Capture Point
for Transition Effect
into CF4eTW Scan Cell
Figure 12-10. CF4e Core to CF4eTW Output Scan Delay Vector Example
When a hard-layout-core is delivered, understanding this operation is unnecessary because
vectors exist for these operations (for example, the CF4e core manufacturing test program).
12.2.4.3 CF4eTW Testing of Noncore Inputs
The CF4eTW is designed as a standalone device; it does not need CF4e core scan
architecture for all testing. A gate-level netlist of the CF4eTW can be included as part of
the noncore logic netlist when vector generation is to be done.
Wrapper scan launch mode uses the CF4eTW’s ability to launch single logic values or
vector pair logic transition values into the noncore logic. This testing uses tbseo, which
enables the CF4eTW scan architecture to either shift data through the CF4eTW output side
scan chains (launching data into the noncore logic) or to capture data from the CF4e core
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-13
Freescale Semiconductor, Inc.
Test Wrapper
logic. The CF4eTW can be operated coincidentally with noncore logic test structures and
can be used to enable structural testing or timing delay testing.
The tbseo signal, together with the non-core logic scan or functional test mode control
signals, allows launching of a single logic value to conduct structural stuck-at testing, or
allows the launching of two consecutive differing logic values (vector pairs) on targeted
input signals, while holding other signals stable for 2 cycles (applying the same value). The
2-cycle transition type of sequence that holds off-path values stable results in what is known
as a robust delay test.
Figure 12-11 shows timing for a wrapper to non-CF4e logic scan stuck-at vector.
Freescale Semiconductor, Inc...
Last Scan Shift In
First Scan Shift Out
Functional
Sample
clkfast
tbsei
Note: inputs not
required for this
example
To apply values
from non-registered
CF4eTW output
signals
tbte
tbseo
Shifting will apply
the necessary
logic values.
Registered
Wrapper
Output
Clock-to-Out
Data Valid Point
from Core Logic
to Noncore Logic
Fault
Exercise
Data
Data
Data
Data
Data
Fault
Effect
Data
Data
Data
Noncore
Scan SE
Noncore
Logic Input
Register
Register Setup
Time Point
for Noncore Logic
Noncore Input
Signal Register
Sample Point
Figure 12-11. CF4eTW to Non-Core Input Scan Stuck-At Vector Example
Figure 12-12 shows timing for a wrapper to non-CF4e logic scan delay vector.
12-14
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Test Wrapper
N-1 Scan Shift In
Last Scan Shift In
First Scan Shift Out
Functional
Sample
clkfast
wsei
Note: inputs not
required for
this example
To apply values
from non-registered
CF4eTW output
signals
Freescale Semiconductor, Inc...
tbte
tbseo
Shifting will apply
the necessary
logic values
Registered
CF4eTW
Output
Logic
Transition
Data
Opposite
Transition
Data
Data
Data
Data
Data
Capture
Path
Data
Data
Data
non-core
Scan SE
Noncore
Logic Input
Register
Register Setup
Time Point
for Noncore Logic
to Wrapper
Noncore Logic
Input Register
Sample Point
Figure 12-12. CF4eTW to Non-Core Delay Scan Vector Example
12.2.4.4 CF4eTW Testing of Noncore Outputs
The CF4eTW wrapper scan capture mode operates independently of the CF4e core scan
architecture. The CF4eTW can be appended to the noncore logic to become the capture part
of its test structure. Using the CF4eTW to launch or capture logic values associated with
the noncore logic requires the use of a gate-level netlist of the CF4eTW to be included as
part of the noncore logic netlist when vector generation is to be accomplished.
Wrapper scan capture mode uses the CF4eTW’s ability to capture logic values or logic
transition values launched from the noncore logic. The CF4eTW can be operated
coincidentally with the noncore logic test structures and can be used to enable structural
testing or timing delay testing.
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-15
Freescale Semiconductor, Inc.
Test Wrapper
Wrapper scan capture mode testing uses the tbsei signal in conjunction with the noncore
logic scan or functional test mode control signals to allow the capture of single logic values,
or two consecutive, differing logic values on targeted input signals.
The tbsei signal enables the CF4eTW scan architecture to either shift data through the
CF4eTW input side scan chains (launching logic values into the CF4e core) or to capture
data from the noncore logic. If noncore logic can launch transitions (vector pairs), the
wrapper can be used to capture one or both cycles of the transitioning test. It must be noted,
however, that having the ability to capture vector pairs launched from the noncore logic
requires that the noncore logic supports the logic test structures to launch the vector pairs.
Freescale Semiconductor, Inc...
Figure 12-13 shows timing for a non-CF4e logic to input wrapper scan stuck-at vector.
Last Scan Shift In
First Scan Shift Out
Functional
Sample
clkfast
tbsei
Note: CF4eTW inputs
come from
noncore logic only
tbte
tbseo
Inputs to noncore
logic from CF4eTW only
not needed for this
example
Noncore
Logic
Output
Noncore Logic
Functional Register
Sample Point
Fault
Exercise
Data
Fault
Capture
Data
Data
Fault
Capture
Data
Data
Data
noncore
Scan SE
Samples noncore logic
registered
CF4eTW
input
Register Setup
Time Point
for noncore logic
to CF4eTW input
Data
Data
CF4eTW input
functional register
Sample Point
Figure 12-13. Non-Core to CF4eTW Input Scan Stuck-At Vector Example
12-16
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
BIST
Figure 12-14 shows timing for a non-CF4e logic to input wrapper scan delay vector.
Last Scan Shift In
N-1 Scan Shift In
First Scan Shift Out
Functional
Sample
clkfast
tbsei
Note: Sample
Needs to be done
by the CF4eTW inputs
Freescale Semiconductor, Inc...
tbte
tbseo
Outputs to non-core
logic not considered
Establish
in this example
logic transition
launched into
CF4eTW
Logic
Registered Transition
Noncore
Data
Output
Opposite
Transition
Data
Data
Data
Data
Data
Capture
Path
Data
Data
Data
Noncore
Scan SE
CF4eTW
functional
input
Register Setup
Time Point
for CF4eTW
iNput Register
Capture Point
for Transition Effect
into CF4eTW input
Figure 12-14. Non-Core to CF4eTW Input Scan Delay Vector Example
12.3 BIST
The following sections describe the Version 4 BIST structure, which is intended to assist
customers, production engineers, and test engineers. Topics include the following:
•
•
•
•
•
BIST memory controllers
BIST core ports
Power analysis
Testing algorithms
BIST test modes
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-17
Freescale Semiconductor, Inc.
BIST
•
•
Memory data retention
Timing
The Version 4 test methodology for testing memories differs from the Version 3 approach
in the following respects:
Freescale Semiconductor, Inc...
•
•
•
In Version 4, the BIST control logic is integrated into the memory controllers rather
than being a separate, added function. This solution supports all V4 memories sizes;
KRAM and KROM sizes from 512 bytes to 64 Kbytes and cache sizes from 2 to 32
Kbytes. This enhancement from the Version 3 BIST is developed around the
memories after memory size is selected for a design. In the Version 3 scheme,
modification of any memories requires rework of BIST logic.
Data retention methodology adds automatic hold logic to the BIST controllers,
which greatly simplifies control logic for data retention and reduces the knowledge
and intervention needed by the user/tester.
Simplifying data retention logic eliminates the need for staggering hold signals for
different size memory arrays. In Version 3, MTMOD signals controlled assertion
and release of the BIST hold signals for memory devices. Due to BIST hold
encodings, MTMOD signals had processor clock timing associated with them. In
Version 4, MTMOD signals have system clock timing associated with them because
hold logic is removed from the MTMOD decode. In addition, significantly fewer
modes are needed with MTMOD for BIST.
12.3.1 BIST Memory Controllers
Control logic includes BIST memory controllers for each memory, reducing critical timing
paths by collapsing sequential multiplexing of address lines. This reduces wiring and
simplifies register sharing. Figure 12-15 shows how this change embeds BIST logic in the
core.
Design containing the CF4e and memories
cf4_core_kbus_tcu
CF4e Core
Core
ICU
Data
and
BIST
Control
ICU
Tag
and
BIST
Control
OCU
Data
and
BIST
Control
OCU
Tag
and
BIST
Control
RAM
0
and
BIST
Control
RAM
1
and
BIST
Control
ROM
0
and
BIST
Control
ROM
1
and
BIST
Control
ICU
Data
ICU
Tag
OCU
Data
OCU
Tag
RAM
0
RAM
1
ROM
0
ROM
1
Figure 12-15. CF4e BIST Hierarchy
12-18
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
BIST
The ColdFire Version 4 reference design handles up to 64-Kbyte memory sizes, using the
memory size indicator to create a specific test for each memory. The cf4_core_kbus_tcu
provides global control of each BIST controller associated with each memory.
BIST memory controllers now have automatic hold logic. Two holds are performed during
the first background. The bisthold signal indicates when the memory is in a hold state; the
bistrelease input removes memories from the hold state.
Freescale Semiconductor, Inc...
12.3.2 BIST Core Ports
During memory array testing, BIST ports must be set as shown in Table 12-3. Memory
BIST logic has two modes chosen through mtmod[2:0]. Production BIST (PBIST)
indicates failing or passing devices. Engineering BIST (EBIST) characterizes memory. The
resetB signal is driven with an OR of MTMOD equal to the modes (see Table 12-3). The
remaining core signals are set to an inactive state. All signals are registered at the CF4e
boundary.
The bistdata output operates at half the processor clock frequency; all other BIST signals
operate at system clock speed. The system/processor clock ratio does not affect BIST
operation. If this were a pad-level rather than a core-level boundary, MTMOD would be
expanded from 2:0 to 3:0 where the assertion of MTMOD3 indicates PLL bypass mode,
and a hizb signal would be asserted during data retention to three-state the outputs.
Table 12-3. BIST Core Pins
Port Name
I/O
BIST Mode
Description
bistdone
Output
PBIST
Indicates the test is complete. The largest memory size determines cycle
run-time for production test. When asserted, bistdone remains asserted and
the PBIST stops. During BIST initialization, this signal toggles (0-1-0) for a
stuck-at fault test on this individual signal. Because bistdone is registered at
the core boundary, a few cycles are added to BIST run time for signal output
generation.
bistfail
Output
PBIST
Indicates a memory array failed. Once asserted, bistfail remains asserted
and PBIST stops. During BIST initialization, bistfail toggles from 0-1-0 for a
stuck-at-fault test.
bisthold
Output
PBIST/
EBIST
Indicates when BIST memory controllers are in a hold state. allowing the
user to begin timing data retention. During BIST initialization, bisthold
toggles from 0-1-0 for a stuck-at-fault test.
bistdata[3:0]
Output
EBIST
Outputs bit-map array data. Operates at 1/2 of the processor clock
frequency. Section 12.3.8, “Memory Data Retention,” gives cycle-by-cycle
details. For pad-level boundaries, bistdata can be muxed with pstddata[3:0].
mtmod[2:0]
Input
PBIST/
EBIST
During BIST, mtmod is applied one cycle before use.
101 PBIST mode. Asserted during the entire PBIST memory test.
110 EBIST mode. Asserted during the entire EBIST test, used to
characterize the array selected by bistmemory[2:0].
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-19
Freescale Semiconductor, Inc.
BIST
Freescale Semiconductor, Inc...
Table 12-3. BIST Core Pins (Continued)
Port Name
I/O
BIST Mode
Description
bistrelease
Input
PBIST/
EBIST
Used to release memories from hold state during data retention. If data
retention testing is not desired, assert this signal for the entire test.
bistmemory[2:0]
Input
EBIST
Selects which memory is characterized.
000 Data cache tag
001 Data cache data
010 Instruction cache tag
011 Instruction cache data
100 RAM 0
101 RAM 1
110 ROM 0
111 ROM 1
12.3.3 Power Analysis
To maximize testing capabilities, reduce design time, and prevent potential brown-out
conditions that could occur if too many memories are switching simultaneously, it is critical
to analyze power considerations when memories are tested in parallel. Packaging can also
affect power considerations. The reference design has a potential of eight memories:
•
•
•
•
•
•
•
•
Operand cache unit (OCU) tag
OCU data
Instruction cache unit (ICU) tag
ICU data
RAM 0
RAM 1
ROM 0
ROM 1 array
Caches can be from 2 to 32 Kbytes. RAMs and ROMs can be 512 bytes to 64 Kbytes. Based
on BIST controller operation, memory size is independent. If an array is 8 or 32 Kbytes,
only one 2-Kbyte 512 x 32 SRAM block is active at a time; 2-Kbyte basic memory blocks
are assumed.
Pins and internal core logic not used for BIST (that is, all signals not in Table 12-3) are set
to a quiescent state during initialization, so BIST power consumption is negligible
compared to functional operation.
12.3.4 Staging of Memories
If the number of memories to be tested warrants parallel-sequential testing, memories are
split into two groups and one is tested first. After all memories in the first group are tested
(logical AND of all controller bistdone signals) with no failures, the second group is tested.
A similar approach is used to test data retention, using an output signal for each array
12-20
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
BIST
controller to indicate when memories are in a hold state.
As soon as all of the memories are held in the first group, the second group is initialized.
This staging is controlled internally and does not affect the user. Staging adds 200 ms to the
BIST memory testing and an additional 200 ms to data retention testing if it is not included
in memory testing. Staging increases estimated test time from 0.9 to 1.3 seconds.
12.3.5 Testing Algorithms
Freescale Semiconductor, Inc...
Data and instruction cache arrays are tested along with the RAM arrays using the March C+
algorithm. ROM arrays are tested using a prime polynomial of order 32.
12.3.5.1 March C+ Algorithm
The March C+ algorithm has six parts with an example background of 5s and As.
NOTE:
Automatic holds, or self pauses, occur only during background
5-A at the end of the first two parts.
1. Part 1
a) Begin at first address location (address 0)
b) Initialize (write) all locations with the initial data background
c) Conduct self-pause (only for first data background pass 5-A)
2. Part 2
a) Release self-pause (only for first data background pass 5-A)
b) Begin at first address location (address 0)
c) Read (initial)—write (complement)—read (complement)
d) Increment address and repeat for entire address space
e) Conduct self-pause (only for first data background pass 5-A)
3. Part 3
a) Release self-pause (only for first data background pass 5-A)
b) Begin at first address location (address 0)
c) Read (data)—write (complement)—read (complement)
d) Increment address and repeat for entire address space
4. Part 4
a) Begin at last address location (address max)
b) Read (data)—write (complement)—read (complement)
c) Decrement address and repeat for entire address space
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-21
Freescale Semiconductor, Inc.
BIST
Freescale Semiconductor, Inc...
5. Part 5
a) Begin at last address location (address max)
b) Read (data)—write (complement)—read (complement)
c) Decrement address and repeat for entire address space
6. Part 6
a) Begin at last address location (address max)
b) Read (data)
c) Decrement address and repeat for entire address space
The March C+ algorithm is a 14N per data background with 14 operations per data word.
A data background is a bit stream pair, such as 5-A, 3-C, and 0-F. Motorola prefers using
three backgrounds to provide coverage of the memory array independent of physical
organization. The March C+ algorithm provides a fault coverage of more than 99.9% of the
memory defect classes.
12.3.6 ROM BIST Algorithm
A memory BIST for a read-only memory (ROM) has two purposes: to verify data in the
ROM and to ensure no defects affect that data when a read operation is conducted (a
destructive read). ROM verification uses a compression scheme in which the compressed
data (or signature) is compared against a golden value created when the ROM data is
programmed. The scheme is based on cyclic-redundancy code (CRC) methodology, which
uses a linear feedback shift register (LFSR). An LFSR that has inputs as the output data bus
from memory is generally called a multiple input signature-analysis register (MISR).
MISR-LFSR is a shift register in which the last (right-most) bit is brought back to various
bits along the length of the shift register through XOR gates. The memory data bus is also
brought into the LFSR through XOR gates. The bits that receive the feedback are chosen
from polynomial tables. The initial LFSR state before it captures any data is called the seed.
The MISR-LFSR operation is similar to division by a prime number, as follows:
•
•
•
The input data stream = the dividend
The polynomial = the prime divisor
The state of the MISR-LFSR after each capture cycle = the remainder
The ROM BIST has two read passes on the memory, as shown in the following steps:
1.
2.
3.
4.
5.
6.
12-22
Begin at first address location (address 0)
Read (data)—compress (data)
Increment address and repeat for entire address space
Conduct self-pause
Release self-pause
Begin at first address location (address 0)
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
BIST
7. Read (data)—compress (data)
8. Increment address and repeat for entire address space
9. Conduct self-pause
10. Release self-pause
11. Compare signature to stored signature value
12. Assert done and assert fail if necessary
Freescale Semiconductor, Inc...
The first pass through the ROM should begin at address 0. Each address location is read as
the test increments through the address space and captures the data in the MISR. The first
pass verifies the program data and the address decode.
The second read also starts at address 0 (step 6) and increments through the address. The
second read tests for any defects resulting from addressing or reading destructive reads. The
ROM is designed so that the golden comparison signature sets the MISR to a known value
(all zeros) at the end of the second read pass. The ROM signature is placed in the last
address (address max), so this value can be read during testing because it returns the
signature analyzer to its pretest state. The Version 4 ROM BIST logic assumes ROM array
dimensions of 32 bits by length L. At the conclusion of the reads, a fail flag is set if the
calculated signature does not match the golden signature.
NOTE:
A retention test is not required, but is in the Version 4 ROM
solution. HOLD mode pauses the ROM simultaneously with
other memory arrays. This allows memories to gracefully
remain in a quiescent state during data retention.
The scripts written in Section 12.3.6.1, “Modify BIST ROM Signature Script—Part 1,” and
Section 12.3.6.2, “Modify BIST ROM Signature Script—Part 2,” together create a
self-checking ROM signature value. The pseudo signature is the last line in the array.
Hence, the last line of the ROM cannot be used for functional purposes. When the BIST
MISR is applied to the ROM array, the final signature results in a value of all zeros if the
array has no errors. The script calculates the signature using the same algorithm, prime
polynomial of order 32, that the ROM BIST module uses. The prime polynomial used in an
N-bit polynomial is a N-bit binary expansion of a prime number that falls between 2 (N-1)
and 2 N. Because of the math theorem that states that there is at least one prime number
between any numbers X and 2X, at least one prime number lies between 2 (N-1) and 2N.
Motorola chose one of these prime numbers for the algorithm.
12.3.6.1 Modify BIST ROM Signature Script—Part 1
#############################################################
# FILE: modify_signature.sh
# DESCRIPTION:
# shell to modify N x 32 ROM last word (signature) for V4 ROM BIST. This shell
# takes as input a file name of the original ROM code, and creates a new file
# called by the original filename with “.modified” appended. This shell also
# diffs the two files.
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-23
Freescale Semiconductor, Inc.
BIST
# The only difference is the modified last line. The original file’s last line
# does not change. The new file’s last line contains the correct ROM BIST
# signature of the proceeding N-1 lines.
#############################################################
# INPUT: The input file is an array of N lines each of which contains 32 ascii
# ones
# (1)s and zeroes (0)s. It should NOT contain any blank lines, NOR any spaces.
#############################################################
# OPERATION: call as such: modify_signature.sh original_filename
# This script invokes a nawk script called modify_signature.nawk.
#############################################################
sed ‘s/0/ 0/g’ $1 | sed ‘s/1/ 1/g’ | sed ‘s/^ //’ | nawk -f modify_signature.nawk
|
sed ‘s/ //g’ > $1.modified
diff $1 $1.modified
Freescale Semiconductor, Inc...
12.3.6.2 Modify BIST ROM Signature Script—Part 2
##########################################################
# FILE: modify_signature.nawk
# DESCRIPTION: Use nawk to emulate V4 ROM BIST algorithm.
# N is the length of the array.
##########################################################
# initialize the signature register to all zeroes
##########################################################
BEGIN{for(initial_count=0;initial_count<32;initial_count++)
signature[initial_count]=0}
##########################################################
# calculate next_signature, the only thing special is bit 31
##########################################################
{for(next_signature_count=0;next_signature_count<31;next_signature_count++)
next_signature[(30-next_signature_count)]=xor(signature[(31next_signature_count)],$(next_signature_count+2));
next_signature[31]
xor(xor(xor(signature[0],signature[1]),xor(signature[2],signature[22])),$1);
##########################################################
# move next_signature to current signature.
##########################################################
for(signature_count=0;signature_count<32;signature_count++)
signature[signature_count] = next_signature[signature_count];
##########################################################
# The last line of the file is the signature.
# If NR is equal to the last line, modify it. Else reprint the line.
##########################################################
if (NR == N)
print
xor($1,signature[31])
xor($2,signature[30])
xor($3,signature[29])
xor($4,signature[28]),
xor($5,signature[27]),
xor($6,signature[26])
xor($7,signature[25])
xor($8,signature[24]),
xor($9,signature[23]),
xor($10,signature[22])
xor($11,signature[21])
xor($12,signature[20]),
xor($13,signature[19])
xor($14,signature[18])
xor($15,signature[17])
xor($16,signature[16]),
xor($17,signature[15])
xor($18,signature[14])
xor($19,signature[13])
xor($20,signature[12]),
xor($21,signature[11])
xor($22,signature[10])
xor($23,signature[9])
xor($24,signature[8]),
xor($25,signature[7]),
xor($26,signature[6])
xor($27,signature[5])
xor($28,signature[4]),
xor($29,signature[3])
xor($30,signature[2]),
xor($31,signature[1])
xor($32,signature[0])
else
print $1 $2 $3 $4,$5 $6 $7 $8,$9 $10 $11 $12,$13 $14 $15 $16,$17 $18 $19 $20,$21
22 $23 $24,$25 $26 $27 $28,$29 $30 $31 $32}
##########################################################
12-24
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
BIST
# xor function
##########################################################
function xor(first,second)
{if (first == second)
return 0
else
return 1
12.3.7 BIST Test Modes
Freescale Semiconductor, Inc...
Production BIST (PBIST) indicates failing or passing devices. Engineering BIST (EBIST)
characterizes memory. These modes are chosen through mtmod[2:0] (see Table 12-3).
PBIST tests all memories in parallel with individual BIST controllers. If one fails, a failflag
is asserted. When memory testing completes, bistdone is asserted.
EBIST uses the same memory BIST algorithm for bit-mapping memories. Due to VLSI
tester limitations, only one memory can be bit-mapped at a time and is chosen through
bistmemory[2:0] which selects one of the potential eight memories listed in Section 12.3.3,
“Power Analysis.” The data of the memory that is characterized is output onto bistdata[3:0].
On Version 4, bistdata operates at half the processor clock frequency. BIST data is captured
at the frequency at which the memories are operating. To output the data, a complete BIST
test is executed outputting the first data (data[3:0]). Next, the test is rerun on the entire
memory selecting bit data[7:4]. This sequence repeats until all memory is bit-mapped. No
data is lost between capturing the memory data at the processor clock speed and
multiplexing the data onto bistdata. BIST algorithms take multiple processor clocks per
address so data is not changing on a processor cycle-by-cycle basis. This flow process is
exited when the MTMOD state changes. Figure 12-16 illustrates this process.
Start
EBIST
(a = 0)
Run
EBIST
Output
4 bits
(a:a+3)
Start
EBIST
Yes
Test
Done?
No
Incr.
Bits
(a=a+4)
Figure 12-16. Flow of Characterization Method
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-25
Freescale Semiconductor, Inc.
BIST
12.3.8 Memory Data Retention
For maximum efficiency, the data retention scheme initializes all memories and places each
in a hold state at the appropriate time. Two data retention holds occur automatically during
background 5-A. As Figure 12-17 shows, the first hold occurs when memories under test
are initialized with 5s; a second occurs when they are initialized with As.
Inc (w5); Inc (r5,wA,rA); Inc (rA,w5,r5); dec (r5,wA,rA); dec(rA,w5,r5); dec (r5)
Hold1
Hold2
Freescale Semiconductor, Inc...
Figure 12-17. March C+ Algorithm
The bisthold output indicates that all memories under test are in hold state. At that time, the
user begins counting the hold time on the memories. When bistrelease is asserted, the hold
is released and testing continues. bistrelease should be negated before the end of the next
march part. Steps for data retention are as follows:
1. Assert PBIST (MTMOD = 101) or EBIST (MTMOD = 110)
2. When bisthold is asserted, the user can start a data retention counter. (Assume
bistrelease is negated.)
3. Assert bistrelease to release hold on memories and continue testing.
4. To perform data retention on inverse data, negate bistrelease and repeat steps 2 and
3.
The bistfail signal asserts if a memory failure occurs during the next read of the array. The
internal controller initializes all memories in each group if memory staging is required; that
is, power consumption becomes a criterion because of the number of memories under test.
The internal controller releases the first group so one group can be tested at a time. This
complication is not apparent to the user.
BIST ROM controllers include data retention logic for consistency when testing other
memories with data retention, not for ROM data retention testing itself. For example, when
all memories on a chip are tested in PBIST mode with data retention, the clock may not
always be asserted during the actual data retention. Therefore, ROM BIST logic should
contain hold logic to prevent the ROM BIST controller from becoming lost. For ROM
BIST, the hold occurs after a read has completed.
If data retention is not desired, assert bistrelease throughout the test. At least two processor
clock delays occur for each hold.
12.3.9 Timing
The following sections describe how to determine the clock cycle count and provide BIST
timing examples.
12-26
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
BIST
12.3.9.1 Memory Clock Determination
For tests using the March C+ algorithm, note that the algorithm has six parts, each with a
READ-WRITE protocol. Each protocol requires six clock cycles to conduct an rwr, r, or w
sequence, three data backgrounds, and the number of address locations. The example in
Figure 12-18 determines the minimum cycle count for a 512 x 32 RAM with one clock
cycle read and write protocols.
NOTE:
This example does not factor in initialization time.
Freescale Semiconductor, Inc...
(512 words x 6 cycles/word) x (6 parts/March C+ algorithm) x (3 backgrounds) = 55296 cycles
Figure 12-18. 512 x 32 RAM BIST Clock Cycles
ROM array cycles have three parts to each READ sequence (three clocks per address). The
read through the memory occurs twice. The example in Figure 12-19 is for a 512 x 32 ROM
with 1 clock cycle read protocols. This example does not factor in initialization time.
(512 words x3 cycles/word) x (2 read parts) = 3072 cycles
Figure 12-19. 512 x 32 ROM BIST Clock Cycles
The clock cycle determination for PBIST and each EBIST is calculated in Figure 12-4 with
examples of an 8-Kbyte cache and a 4-Kbyte RAM. As discussed above, BIST test requires
6 clock cycles per address, the March C+ test has six parts for a complete test, there are
three backgrounds applied per BIST test, and there are eight tests run for a complete EBIST
test, given a data array of width 32.
A PBIST test is completed when the largest array has finished or when a failure occurs on
an array. The variable X accounts for extra clock cycles, including core initialization, BIST
reset, and holds associated with data retention. During the beginning of a BIST test, there
are 16 processor clocks for core initialization and 12 processor clocks for BIST reset
initialization. Each restart of the BIST test during EBIST mode requires another BIST reset
initialization. For an PBIST test without data retention, X is 28 processor clocks. A few
cycles should be padded at the end of a PBIST mode test to wait for bistdone to assert.
Table 12-4. BIST Cycles
Parameter
Max address space (MAS)
Clks/March C+ algorithm part (part offset: P)
Clocks/March C+ algorithm
(background offset: B)
Clocks/March C+ algorithm with three backgrounds
(Test Offset: T)
Data Array (8K)
RAM Array (4K)
Tag Array (8K)
2048
1024
512
6 × 2048 = 12288
6 × 1024 = 6144
6 × 512 = 3072
6 × 12288 = 73728
6 × 6144 = 36864
6 × 3072 = 18432
3 × 73728 = 221184 3 × 36864 = 110592 3 × 18432 = 55296
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-27
Freescale Semiconductor, Inc.
BIST
Table 12-4. BIST Cycles
Parameter
Data Array (8K)
RAM Array (4K)
Tag Array (8K)
Clocks/complete PBIST test
221184 + X
110592 + X
55296 + X
Clocks/complete EBIST test
4 × 221184 =
884736 + X
4 × 110592 =
442368 + X
4 × 55296=
221184 + X
Freescale Semiconductor, Inc...
12.3.10 Timing Diagrams
Initialization is similar for PBIST and EBIST. In the PBIST mode example in Figure 12-20,
Only the first 27 cycles are shown; the remaining cycles should remain at the state shown
in cycle 27 unless there is a failure with the exception of bisthold and bistrelease. The
mtmod[2:0] signals are always assumed to be in non-BIST state before going into BIST
mode in cycle 3. In PBIST mode, core initialization takes 16 processor clocks, during
which the internal BIST control resets the internal control logic and toggles the BIST
outputs bistdone, bistfail, and bisthold for stuck-at-fault coverage.
The testing algorithm begins at cycle 30. For PBIST tests, bistfail and bistdone are
monitored. For EBIST tests, reinitialization between BIST runs takes 12 cycles and would
be represented by cycles 18–30.
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
processor clk
mtmod[2:0]
100
101
bistmemory[2:0]
bistrelease
bistdone
bistfail
bisthold
bistdata[3:0]
Figure 12-20. PBIST Initialization
In Figure 12-21, an 8-Kbyte ICU tag array is tested with EBIST mode. The cycles shown
represent a transition from the [W(5)] March C+ part to the [R(5),W(A),R(A)] part. This
example shows bistdata output during address 0, [R(5)] March C+ part output data, all other
March C+ part output data, and a minimum data retention hold.
The first March Part [W(5)] bistdata is always data, or 5 in this case. Consider the following
cases:
12-28
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
BIST
•
During address 0 when there is a minimum automatic hold, bistdata is [data, data,
data, data, data, data, data, data].
During address 0 when there is not an automatic hold, bistdata is [data, data, data,
data, data, data].
At all other addresses, bistdata is [data,data,data,DATA,DATA,data], in this case
[1,1,1,2,2,1].
•
•
Freescale Semiconductor, Inc...
The bistdata signals always operate at half the processor clock frequency. All other BIST
input and output pins are clocked by the system clock. In this example, the
system/processor clock ratio is 2:1, only the processor clock is displayed. This ratio can be
2:1, 3:1, or 4:1. This example is specific to EBIST tag arrays.
3096 3097 3098 3099 3100 3101 3102 3103 310431-053106 3107 3108 3109 3110 3111 3112 3113 3114 3115
processor clk
mtmod[2:0]
110
bistmemory[2:0]
010
bistrelease
bistdone
bistfail
bisthold
1
bistdata[3:0]
127
*memory address[31:0]
2
0
155555D
*memory datain[31:7]
*memory dataout[31:7]
1555551
1
1
0AAAAA2
x...x
1555551
0AAAAA2
1555551 0AAAAA2
* internal signals
Figure 12-21. EBIST Timing Diagram for an 8-Kbyte Cache Tag Array
Figure 12-21 shows EBIST test 0 with background 0. The bistdata output for part of the
array is data 1, data 2 due to the fact that the tag arrays widths are maximum of 25 bits.
Figure 12-5 shows variations on the BIST data output for the difference size memory arrays
given the test number and background. The information in the table is data, data, and old
data, where old data is the data from the last background.
Table 12-5. EBIST Tag Output Data
Tag
Size
Test 0
Test 1
Test 2+
Bckgnd 0 Bckgnd 1 Bckgnd 2 Bckgnd 0 Bckgnd 1 Bckgnd 2 Bckgnd 0 Bckgnd 1 Bckgnd 2
2K
5, A, X
3, C, 5
0, F, 3
5, A, 0
3, C, 5
0, F, 3
5, A, 0
3, C, 5
0, F, 3
4K
1, A, X
3, 8, 1
0, D, 3
5, A, 0
3, C, 5
0, F, 3
5, A, 0
3, C, 5
0, F, 3
8K
1, 2, X
3, 0, 1
0, 3, 3
5, A, 0
3, C, 5
0, F, 3
5, A, 0
3, C, 5
0, F, 3
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-29
Freescale Semiconductor, Inc.
BIST
Table 12-5. EBIST Tag Output Data (Continued)
Freescale Semiconductor, Inc...
Tag
Size
Test 0
Test 1
Test 2+
Bckgnd 0 Bckgnd 1 Bckgnd 2 Bckgnd 0 Bckgnd 1 Bckgnd 2 Bckgnd 0 Bckgnd 1 Bckgnd 2
16K
1, 2, X
3, 0, 1
0, 3, 3
4, A, 0
2, C, 5
0, E, 2
5, A, 0
3, C, 5
0, F, 3
32K
1, 2, X
3, 0, 1
0, 3, 3
4, 8, 0
0, C, 4
0, C, 0
5, A, 0
3, C, 5
0, F, 3
In Figure 12-22, an 8-Kbyte ICU data array is tested with EBIST mode. The cycles shown
represent a transition from the [W(5)] March C+ part to the [R(5),W(A),R(A)] part. During
address 0, bistdata is [data, data, data, data, data, data, data, data] for the first two part
transitions of the first background; minimum of two processor clock halt for the automatic
hold signal. For the remaining address 0, bistdata[3:0] output is [data, data, data, data, data,
data]. For all other addresses, then it is [data,data,data,DATA,DATA,data], in this case
[5,5,5,A,A,5]. In this example, the system/processor clock ratio is 2:1, but only the
processor clock is displayed.
12312 13
14
15
16
17
18
19 12320 21
22
23
24
25
26
27
28
29 12330 31
processor clk
mtmod[2:0]
110
bistmemory[2:0]
011
bistrelease
bistdone
bistfail
bisthold
5
bistdata[3:0]
*memory address[31:0]
2047
*memory datain[31:0]]
5555555
memory dataout[31:0]
*internal signals
5555555
x...x
A
0
5
1
AAAAAAA
5555555
AAAAAAA
5555555 AAAAAAA
Figure 12-22. EBIST Timing Diagram For An 8-Kbyte Cache Data Array
In Figure 12-23, a 2-Kbyte KRAM0 array is tested in EBIST mode. The cycles shown
represent a transition from the [W(5)] March C+ part to the [R(5),W(A),R(A)] part. During
address 0, bistdata[3:0] is [olddata, olddata, olddata, olddata, olddata, data, data, data] for
the first two parts of the first background; minimum of two processor clock halt for the
automatic hold signal. Old data is the data from the previous March Part. For the remaining
address 0, bistdata is [olddata, olddata, olddata, data, data, data]. For all other addresses,
the bistdata output is [data,data,data,DATA,DATA,data], in this case [5,5,5,A,A,5].
12-30
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
BIST
3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115
processor clk
mtmod[2:0]
110
bistmemory[2:0]
100
bistrelease
bistdone
bistfail
bisthold
X
Freescale Semiconductor, Inc...
bistdata[3:0]
X
5
511
*memory address[31:0]
5555555
A
0
5555555
*memory datain[31:0]
*memory dataout[31:0]
*internal signals
5
5
1
AAAAAAA
5555555
XXXXXX
AAAAAAA
5555555
AAA
Figure 12-23. EBIST Timing Diagram For A 2-Kbyte KRAM0 Array
Recall that the ROM EBIST algorithm is different from the other arrays. In the 2-Kbyte
ROM in EBIST mode example (Figure 12-24), there are 3 processor clocks per address.
The bistdata output is sampled on cycles 1 and 3. Notice that this array output does not have
the special output data cases that the other arrays have. The minimum data retention delay
of two processor clocks occur at address 0.
795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814
processor clk
mtmod[2:0]
110
bistmemory[2:0]
110
bistrelease
bistdone
bistfail
bisthold
bistdata[3:0]
*memory address[31:0]
*memory dataout[31:0]
5
0
C
254
FFF..FF
255
FFF..FF
6DBC19BC E3ED6540
3
E
0
1
FFF..FF A53EEDCE FFF..FF
DF853960
2
FFF..FF
3
FFF..FF
762D5C5E 7D484923
888E
*internal signals
Figure 12-24. EBIST Timing Diagram For 2-Kbyte KROM0 Array
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-31
Freescale Semiconductor, Inc.
Integration Connections
12.4 Integration Connections
The three mtmod test input signals must be routed from the package pins to the
cf4_core_kbus_tcu. This can be a direct connection from the package pins to the CF4eCore
or another form of dedicated chip-level test selection that creates the three mtmod signals
in a chip-level test controller.
Freescale Semiconductor, Inc...
Rules for connecting the CF4eCore scan interface are straightforward. When the mtmod
encoding distributed to the CF4eCore is either 3 (burn-in scan mode) or 4 (production scan
mode), the si and tbsi scan inputs must be connected to package pins as inputs; so and tbso
scan outputs must be connected to package pins as outputs; and se, tbte, tbsei, and tbseo
scan shift control signals must be connected to package pins as inputs.
The CF4e parallel scan connections can be gated off whenever they are not being used in a
scan mode (when encodings 3 and 4 are not selected). At that time, parallel scan inputs must
be driven to a logic 0 and parallel scan outputs must be ignored.
On the other hand, the CF4e test wrapper scan connections can be used across several test
modes. Because the test wrapper is shared by CF4e and non-CF4e logic, access to the CF4e
test wrapper connections must be provided in both core and non-core test modes. This
should include any CF4e core scan test mode, any non-core logic scan test mode, and any
non-core logic test mode that wishes to drive or view the CF4e interface.
12.5 Test Controller
The CF4e test controller unit (TCU) contains the head and tail registers of all the scan
chains in the design. A head or tail register is the first or last register in a scan chain,
respectively. Head and tail registers are used only for scan shift operations and have no
functional use. The TCU decodes the three Motorola test mode signals (mtmod[2:0].) The
encodings are described in the following section.
12.5.1 MTMOD[2:0] Encodings
The CF4e has a set of three test mode pins (mtmod[2:0]) that control all test modes of the
core. Table 12-6 describes the encoding for these signals.
Table 12-6. CF4e Motorola Test Mode Encodings
12-32
Hex
MTMOD[2:0]
Mode or Action
0
000
Functional mode
1
001
Functional mode with JTAG mode active
2
010
PLL test mode
3
011
Burn-in scan mode
4
100
Production scan mode
5
101
Production BIST (PBIST)
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Test Controller
Table 12-6. CF4e Motorola Test Mode Encodings (Continued)
Hex
MTMOD[2:0]
Mode or Action
6
110
Engineering BIST (EBIST)
7
111
Safe mode
Note the following:
•
Freescale Semiconductor, Inc...
•
•
•
•
•
The 000 mode would be chosen for standard functional operation. None of the test
signals are asserted and all test inputs default to the quiescent 0-driven state.
The CF4e does not contain a PLL, but it is set up such that when mode 010 is chosen
the core goes into an idle mode after reset.
During burn-in scan more, both internal and wrapper scan chains are used. The
memory arrays are active for activities in which randomly applied scan vectors will
be written into and exercise the arrays.
During production scan mode, both internal and wrapper scan chains are used.
Memories external to the core should be write inhibited. This memory lock is to
reduce noise, power, and to create a safe environment for the memory arrays since
scan will randomly toggle the data, address, read-write, and output enable control
signals.
The two BIST encodings put the core into idle mode as well.
Safe mode puts the core into reset, but still allows use of the wrapper scan chains for
testing the peripheral logic. This helps with power issues and to keep the core safe
while other portions of the device are being tested.
Chapter 12. Test
For More Information On This Product,
Go to: www.freescale.com
12-33
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Test Controller
12-34
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
Appendix A
Core Interface Timing Characteristics
This appendix provides a Synopsys-compatible timing budget constraint file, which details
the relative input arrival times and output delays for every interface signal in the CF4e core
design. The relative timings are expressed as a fraction of the processor’s cycle time to
provide a relatively technology-independent timing budget.
NOTE:
This appendix is provided as a reference. Actual pin timing is a
function of synthesis methodology, process technology,
place-and-route details, and external signal loading.
In this budget, maximum clock period is the period of the processor’s fast clk; VCLK is
simply a virtual clock reference with the same period as the maximum clock period. The
virtual clock is used as a way to reference input and output timings. The variable
REGSETUP defines the register setup time budget; REGDELAY defines the register output
time budget. These budgets include the actual register setup and clock-to-out times, some
small amount of logic, and the clock skew. Note that these variables must be linked to the
target technology. The variable clk_logic_period is maximum clock period minus both
REGSETUP and REGDELAY.
The relative timings given are designed to provide a good timing budget across a wide range
of process and frequency targets. Table A-1 gives some suggestions for REGSETUP,
REGDELAY, and clock logic period for various process and frequency targets.
.
Table A-1. Timing Budget Variables for Various Process and Frequency Targets
Process and Frequency Target
Maximum Clock Period
REGSETUP
REGDELAY
Clock Logic Period
0.18µ to 0.25µ 200 MHz
5.00 ns
1.00 ns
1.00 ns
3.00 ns
0.13µ to 0.25µ 250 MHz
4.00 ns
0.80 ns
0.80 ns
2.60 ns
0.13µ to 0.18µ 300 MHz
3.33 ns
0.67 ns
0.67 ns
2.00 ns
< 0.13µ 350MHz
< 0.13µ 400MHz
1
2.86 ns
2.50 ns
0.80 ns
1
0.80 ns
1
0.80 ns
1
1.66 ns
0.80 ns
1
1.30 ns
REGSETUP and REGDELAY are padded to account for possible interconnect delay between a transmitted
register and a receiving register .
Appendix A. Core Interface Timing Characteristics
For More Information On This Product,
Go to: www.freescale.com
A-1
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
/******************************************************************************/
/******************************************************************************/
//
// Version 4 ColdFire Reference Design INPUT/OUTPUT SIGNALS
//
/******************************************************************************/
/******************************************************************************/
/* Outputs */
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”bistdone”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”bistdata[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”bistfail”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”bisthold”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”maddr[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”mtt[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”mtm[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”mrw”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”msiz[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”mwdata[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”mwdataoe”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”mapb”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”mdpb”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”mlockb”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”bdmforceackb”)
set_output_delay { REGSETUP + ( clk_logic_period * ( 1.00 - 0.10 ) ) } -clock
“VCLK” find(port,”so[*]”)
set_output_delay { REGSETUP + ( clk_logic_period * ( 1.00 - 0.10 ) ) } -clock
“VCLK” find(port,”tbso[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } lock “VCLK” find(port,”cpustopb”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”cpuhaltb”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”pstclk”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsientb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsiwrttb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsiwlvt[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsirowst[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsiaddrt[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsisw”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsisv”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsiendb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsiwrtdb[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsiwtbyted[*]”)
A-2
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsirowsd[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsicwrdata[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsoentb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsowrttb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsowlvt[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsorowst[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsoaddrt[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsosw”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsosv”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsoendb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsowrtdb[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsowtbyted[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsorowsd[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”nsocwrdata[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”kram0addr[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”kram0di[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”kram0web[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”kram0csb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”kram1addr[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”kram1di[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”kram1web[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”kram1csb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”krom0csb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”krom0addr[*]”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”krom1csb”)
set_output_delay { REGSETUP - 0.20 } -clock “VCLK” find(port,”krom1addr[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”pstddata[*]”)
set_output_delay { REGSETUP + 0.30 + ( clk_logic_period * ( 1.00 - 0.00 ) ) } clock “VCLK” find(port,”dsdo”)
/* Inputs */
set_input_delay { REGDELAY + ( clk_logic_period * 0.75 ) } -clock
find(port,”mclken”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”mtmod[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”bistrelease”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”bistmemory[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”si[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.00 ) } -clock
find(port,”se”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”tbsi[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.20 ) } -clock
find(port,”tbsei”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.20 ) } -clock
find(port,”tbseo”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.00 ) } -clock
find(port,”tbte”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.10 ) } -clock
find(port,”mrdata[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.15 ) } -clock
find(port,”mtab”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.15 ) } -clock
find(port,”mahb”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.75 ) } -clock
find(port,”miplb[*]”)
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
Appendix A. Core Interface Timing Characteristics
For More Information On This Product,
Go to: www.freescale.com
A-3
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
set_input_delay {
0.00 } -clock “VCLK” find(port,”icsize[*]”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”ocsize[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ictag3do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”icw3do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”icv3do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ictag2do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”icw2do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”icv2do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ictag1do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”icw1do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”icv1do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ictag0do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”icw0do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”icv0do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock
find(port,”iclvl3do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock
find(port,”iclvl2do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock
find(port,”iclvl1do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock
find(port,”iclvl0do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”octag3do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ocw3do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ocv3do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”octag2do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ocw2do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ocv2do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”octag1do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ocw1do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ocv1do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”octag0do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ocw0do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.50 ) } -clock
find(port,”ocv0do”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock
A-4
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
“VCLK”
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
find(port,”oclvl3do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock “VCLK”
find(port,”oclvl2do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock “VCLK”
find(port,”oclvl1do[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock “VCLK”
find(port,”oclvl0do[*]”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”enspecialkram”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”kram0size[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock “VCLK”
find(port,”kram0do[*]”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”kram1size[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock “VCLK”
find(port,”kram1do[*]”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”krom0size[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock “VCLK”
find(port,”krom0do[*]”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”krom0vldrst”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”krom1size[*]”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.66 ) } -clock “VCLK”
find(port,”krom1do[*]”)
set_input_delay {
0.00 } -clock “VCLK” find(port,”krom1vldrst”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.75 ) } -clock “VCLK”
find(port,”mrstib”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.75 ) } -clock “VCLK”
find(port,”dsclk”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.75 ) } -clock “VCLK”
find(port,”dsdi”)
set_input_delay { REGDELAY + ( clk_logic_period * 0.75 ) } -clock “VCLK”
find(port,”mbkptb”)
set_load -pin_load
0.5 all_outputs()
/* Set max_fanout so timing can be characterized for toolkit */
set_max_fanout 1 all_inputs() - find(clock, “*”)
/* Ensure all outputs are buffered */
set_max_fanout default_max_fanout current_design
set_fanout_load default_max_fanout all_outputs()
/* to improve timing on illegalInst_ied, flatten CF4CpuIfpEarlyDecode: */
set_flatten true -design find(design,”CF4CpuIfpEarlyDecode*”) -effort medium
Appendix A. Core Interface Timing Characteristics
For More Information On This Product,
Go to: www.freescale.com
A-5
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
A-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
INDEX
Freescale Semiconductor, Inc...
A
AATR, 11-26
ABLR/ABHR, 11-26
Address map
CPU, 1-7
Addressing
modes, 1-12
variant, 11-8
Architectural summary, 1-3
Architecture
virtual memory management, 10-1
B
BDM
address attribute register, 11-16
command
format, 11-33
sequence diagrams, 11-34
set descriptions, 11-35
command set summary, 11-32
extension words as required, 11-33
Motorola-recommended pinout, 11-68
receive packet format, 11-30
serial interface, 11-29
transmit packet format, 11-31
BIST
core ports, 12-19
general, 12-17
memory
clock determination, 12-27
controllers, 12-18
data retention, 12-26
modify ROM signature script, 12-23
power analysis, 12-20
ROM algorithm, 12-22
staging of memories, 12-20
test modes, 12-25
testing algorithms, 12-21
timing
cycles, 12-26
diagrams, 12-28
Branch instruction execution timing, 6-28
Buffers
cache push and store, 8-43
Bus
arbitration, 9-16
basic cycles, 9-8
controller system, 8-18
pipelined cycles, 9-9
C
Cache
accesses, 8-39
buffer bus operation, 8-43
caching modes, 8-38
coherency, 8-42
control register, 8-46
copyback mode, 8-39
data state transitions, 8-53
filling, 8-42
inhibited accesses, 8-39
instruction state transitions, 8-52
locking, 8-44
management, 8-50
memory accesses for maintenance, 8-42
operation
general, 8-35
summary, 8-52
optimizing recommendation, 8-33
organization, 8-33
overview, 8-32
protocol, 8-40
push and store buffers, 8-43
pushes, 8-42
read hit, 8-41
read miss, 8-41
registers, 8-45
registers, access control, 8-48
start-up, 8-34
write hit, 8-42
write miss, 8-41
write-through mode, 8-39
CCR, 2-5
CF4eTW architecture example, 12-7
ColdFire core status register, 2-8
ColdFire debug
Revision B, 11-66
Revision C, 11-67
Core
features, 1-1
implementation block diagram, 1-2
overview, 1-1
supervisor status register, 5-7
Index
For More Information On This Product,
Go to: www.freescale.com
Index-1
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
INDEX
Core interface
address and data phase interactions, 9-10
basic bus cycles, 9-8
bus arbitration, 9-16
CF4e pin characteristics, 9-2
ColdFire master bus, 9-6
data size operations, 9-12
line transfers, 9-13
M-Bus operation, 9-8
M-Bus signals, 9-6
pipelined bus cycles, 9-9
signals, 9-1
timing characteristics, 13-1
CPU address map, 1-7
D
Data cache state transitions, 8-53
Data organization
registers, 1-9
Data representation
EMAC, 1-11
Debug
attribute trigger register, 11-13
background mode (BDM), 11-27
breakpoint operation theory, 11-55
C definition of PSTDDATA outputs, 11-58
ColdFire
history, 11-65
Revision C, 11-67
Coldfire
Revision B, 11-66
concurrent BDM and processor operation, 11-58
configuration/status register, 11-17
CPU halt, 11-28
data breakpoint/mask registers, 11-19
dump memory block, 11-41
emulator mode, 11-57
fill memory block, 11-43
force transfer acknowledge, 11-47
interrupts and requests (emulator mode), 11-67
no operation, 11-46
overview, 11-1
PC breakpoint ASID register, 11-26
program counter breakpoint/mask registers, 11-20
programming model
address attribute trigger register (AATR), 11-26
address
breakpoint
registers
(ABLR,
ABHR), 11-26
general, 11-10
trigger definition register (TDR), 11-23
read
A/D register, 11-36
control register, 11-49
memory location, 11-38
Index-2
register, 11-53
real-time trace support, 11-55
general, 11-5
processor halted, 11-9
processor stopped, 11-9
registers
address attribute (AATR), 11-26
address breakpoint (ABLR, ABHR), 11-26
trigger definition (TDR), 11-24
resume execution, 11-45
Revision A shared resources, 11-13
signal descriptions
general, 11-3
processor status/debug data, 11-4
supervisor instruction set, 11-64
taken branch, 11-8
trigger definition register, 11-21
user instruction set, 11-59
write
control register, 11-52
memory location, 11-39
register, 11-54
writeA/D register, 11-37
Debugging in a virtual environment, 10-7
DSCLK, 11-3
E
EMAC
data representation, 1-11, 5-13
instruction
execution times, 6-28
set summary, 5-12
memory map/register set, 5-6
multiply-accumulate unit, 5-1
OEP sequence stalls, 6-13
programming model, 2-6
Exception processing
overview, 7-1
precise faults, 7-8
processor exceptions, 7-5
sack frame definition, 7-4
supervisor/user stack pointers, 7-3
Exceptions, processor, 7-5
Execution locations, instruction, 6-18
Execution times
branch instruction, 6-28
EMAC instruction, 6-28
FPU instruction, 6-30
instruction, 6-21
miscellaneous, 6-27
miscellaneous instruction, 6-27
MOVE instruction, 6-23
one-operand, 6-24
two-operand, 6-25
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
INDEX
Freescale Semiconductor, Inc...
F
Fault-on-fault halt, 11-28
FPU
computational accuracy, 4-11
conditional testing, 4-16
control register, 4-8
data registers, 4-8
data types, 4-4
denormalized numbers, 4-5
infinities, 4-5
normalized numbers, 4-4
not-a-number, 4-5
zeros, 4-4
exceptions
arithmetic, 4-20
branch/set on unordered (BSUN), 4-21
divide-by-zero (DZ), 4-25
general, 4-19
inexact result (INEX), 4-25
input
denormalized number (IDE), 4-22
not-a-number (INAN), 4-22
operand error (OPERR), 4-23
overflow (OVFL), 4-23
state frames, 4-26
underflow (UNFL), 4-24
floating-point data formats, 4-3
instruction
address register, 4-11
execution times, 4-30, 6-30
instructions
overview, 4-28
notational conventions, 4-2
operand data formats and types, 4-3
overview, 4-1
post processing, 4-15
programmer’s model, 4-7
programming model, 2-6
differences, 4-31
results
intermediate, 4-11
rounding, 4-12
signed-integer data formats, 4-3
status register, 4-9
underflow, round, overflow, 4-16
FPU-specific OEP stalls, 6-14
H
Halt, fault-on-fault, 11-28
I
Instruction
execution locations, 6-18
execution times, 6-21, 6-27
Instruction cache state transitions, 8-52
Instruction set
fetch pipeline, 6-4
overview, 1-13
summary, 1-16
Integer data formats, memory, 1-10
K
K-Bus signal connections, 8-8
L
Limited superscalar OEP, 6-9
Local memory
connection specification, 8-8
interactions between modules, 8-7
K-Bus array signal connections, 8-8
overview, 8-1
SRAM overview, 8-22
two-stage pipelined local bus (K-Bus), 8-5
M
MAC
fractional operation mode, 5-9
general operation, 5-3
introduction, 5-2
mask register, 5-11
opcodes, 5-13
overview, 5-1
status register, 5-6
MBAR, 2-10
M-Bus
interrupt support, 9-18
reset operation, 9-18
MC680x0 differences, 4-31
Memory
accesses for cache maintenance, 8-42
integer data formats, 1-10
MMU
architecture
features, 10-2
location, 10-2
architecture implementation
access, 10-4
access error stack frame, 10-5
ACR address improvements, 10-6
changes to ACRs and CACR, 10-6
expanded control register space, 10-6
general, 10-3
instruction and data cache addresses, 10-4
precise faults, 10-4
supervisor protection, 10-7
Index
For More Information On This Product,
Go to: www.freescale.com
Index-3
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
INDEX
supervisor/user stack pointers, 10-5
virtual memory references, 10-4
virtual mode, 10-4
features, 10-1
instructions, 10-22
MMU definition
base address register, 10-11
control register, 10-12
effective address attribute determination, 10-9
fault, test, or TLB address register, 10-15
functionality, 10-10
general, 10-9
memory map, 10-11
operation, 10-18
operation register, 10-13
organization, 10-11
read/write tag and data entry registers, 10-15
status register, 10-14
TLB, 10-17
MMU implementation
general, 10-19
TLB address fields, 10-20
TLB locked entries, 10-21
TLB replacement algorithm, 10-20
MOVE
instruction
timing, 6-23
MOVEC instruction, 11-49
O
OEP
EMAC-specific sequence stalls, 6-13
general, 6-6
instruction folding, 6-9
sequence-related stalls, 6-11, 6-14
V4 conceptual model, 6-6
Operand memory sequence-related stalls, 6-16
P
Pinout, Motorola-recommended BDM, 11-68
Pipelines
instruction fetch, 6-4
operand execution, 6-6
PULSE instruction, 11-7
R
Registers
AATR, 11-13, 11-26
ABLR/ABHR, 11-15
access control (ACR0–ACR3), 2-10
address (A6–A0), 2-4
BAAR, 11-16
Index-4
BDM address attribute, 11-16
cache, 8-45
cache access control, 8-48
cache control (CACR), 2-10, 8-46
condition code (CCR), 2-5
CSR, 11-17
data (D7–D0), 2-4
data breakpoint/mask, 11-19
data organization, 1-9
debug
ABLR/ABHR, 11-26
attribute trigger, 11-13
configuration/status, 11-17
PC breakpoint AISD, 11-26
TDR module, 11-24
F-P
control, 4-8
instruction address, 4-11
status, 4-9
MAC
mask, 5-11
status, 5-6
MASK, 2-5
MBAR, 2-10
MMU
base address, 10-11
control, 10-12
fault, test, or TLB address, 10-15
operation, 10-13
read/write tag and data entry, 10-15
PBR, 11-20
program counter, 2-5
programming model table, 2-12
RAM base address (RAMBAR0/RAMBAR1), 2-10
RAREG/RDREG, 11-36
RCREG, 11-49
RDMREG, 11-53
read
A/D, 11-36
control, 11-49
read debug module, 11-53
ROM base address, 8-29
ROM base address (ROMBAR0/ROMBAR1), 2-10
SIM, base address, 2-10
SR, 2-8
status, 2-8, 2-8
TDR, 11-21
trigger definition, 11-21
user programming model, 2-3
user stack pointer, 2-5
VB, 2-9
vector base, 2-9, 2-9, 7-2
WAREG/WDREG, 11-37
WCREG, 11-52
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc.
INDEX
WDMREG, 11-54
write control, 11-52
write debug module, 11-54
XTDR, 11-23
ROM
base address registers, 8-29
initialization, 8-31
operation, 8-28
overview, 8-28
programming model, 8-29
ROMBAR power management programming, 8-32
Freescale Semiconductor, Inc...
S
SRAM
initialization, 8-25
initialization code, 8-26
operation, 8-23
overview, 8-22
programming model, 8-23
Stalls
EMAC-specifis, 6-13
OEP, 6-11, 6-16
Status register, 2-8
STOP instruction, 11-9, 11-28
Supervisor programming model, 2-7
Supervisor/user stack pointers, 2-9
System bus controller, 8-18
two-operand, 6-25
Transfers, line, 9-13
V
V4
basic pipeline strategy, 6-1
OEP
conceptual model, 6-6
summary, 6-16
programming model, 1-5
Variant address, 11-8
Vector base register, 2-9, 2-9, 7-2
Virtual memory
access error stack frame additions, 10-8
architecture processor support, 10-7
management architecture, 10-1
precise faults, 10-8
supervisor/user stack pointers, 10-8
W
WDDATA execution, 11-7
T
Test controller
MTMOD encodings, 12-32
Test features
CF4e core
inputs, 12-8
outputs, 12-11
noncore
inputs, 12-13
outputs, 12-15
scan chains
block diagram, 12-3
core, 12-2
general, 12-2
wrapper, 12-2
timing, 12-8
wrapper
block diagram, 12-7
cells, 12-5
general, 12-3
Timing
branch instruction execution, 6-28
core interface characteristics, 13-1
MOVE instructions, 6-23
one-operand, 6-24
Index
For More Information On This Product,
Go to: www.freescale.com
Index-5
Freescale Semiconductor, Inc.
Freescale Semiconductor, Inc...
INDEX
Index-6
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
Introduction
1
Registers
2
Instructions
3
FPU
4
EMAC
5
Execution Timing
6
Exceptions
7
Local Memory
8
Core Interface
9
MMU
10
Debug Module
11
Test
12
Timing Constraints
A
Index
For More Information On This Product,
Go to: www.freescale.com
IND
Freescale Semiconductor, Inc...
Freescale Semiconductor, Inc.
1
Introduction
2
Registers
3
Instructions
4
FPU
5
EMAC
6
Execution Timing
7
Exceptions
8
Local Memory
9
Core Interface
10
MMU
11
Debug Module
12
Test
A
Timing Constraints
IND
Index
For More Information On This Product,
Go to: www.freescale.com