View detail for Migrating to a SAM3A-based system from a SAM7A3-based system

Migrating to a SAM3A-based System from a
SAM7A3-based System
1. Introduction
The purpose of this application note is to describe differences, from a software and
hardware point of view, between a SAM7A3 based on ARM7TDMI ® Core and a
SAM3A based on the Cortex™-M3 Core.
2. Associated Documentation and Software
AT91SAM
ARM-based
Flash MCU
Application Note
Before going further into this document, refer to the latest documentation for the corresponding SAM device on the Atmel® web site.
2.1
SAM3A Series
• Device Overview, Device Datasheet/Manual and Application Notes
– http://www.atmel.com/products/microcontrollers/arm/sam3a.aspx
• Evaluation Kit and Software
– http://www.atmel.com/products/microcontrollers/arm/sam3a.aspx?tab=tools
2.2
SAM7A3
• Device Datasheet
– http://www.atmel.com/devices/SAM7A3.aspx?tab=documents
11179A–ATARM–09-Oct-12
3. Device Overview
Table 3-1.
Main Peripheral Differences
Feature
SAM7A3
SAM3A8C
SAM3A4C
Flash
2 x 256 Kbytes
2 x 256 Kbytes
2 x 128 Kbytes
SRAM
32 Kbytes
64 + 32 Kbytes
32 + 32 Kbytes
Package
LQFP100
LQFP100
LFBGA100
Number of PIOs
62
63
Central DMA
N/A
4 channels
12-bit ADC
16 ch. (10-bit)
16 ch.(1)
12-bit DAC
N/A
Timer Counter
9 x 16-bit
9
x 32-bit
PDC
Channels
17
15
USART/
UART
3 / 1 (DBGU)
3/1
2/4
1/4 + 3
TWI
1
2
MCI
1 slot
4 bits
1 slot - 4 bits
High Speed
SPI (3)
CAN
SSC
Notes:
2 ch.
(2)
2
2
1
1
1. One channel is reserved for internal temperature sensor.
2. 6 outputs available
3. 2 / 8 + 4 = Numbers of SPI Controllers / Number of Chip Selects + Number of USARTs with SPI Mode
2
Application Note
11179A–ATARM–09-Oct-12
Application Note
4. ARM7TDMI and Cortex-M3 Comparison
The table below gives the main differences between both cores. For a more detailed overview of
the Cortex-M3, refer to [1].
Table 4-1.
Core Overview
ARM7TDMI
Architecture
Cortex-M3
ARMv4T (Von Neumann)
®
ISA support
Thumb / ARM
Pipeline
ARMv7-M (Harvard)
®
Thumb / Thumb-2
3-Stage
3-Stage + branch speculation
FIQ / IRQ
1 to 240 physical interrupts
24-42 cycles
12 cycles
0.95 DMIPS/MHz (ARM mode)
1.25 DMIPS/MHz
Memory protection
None
8-region Memory Protection Unit
Sleep modes
None
Integrated
Debug
JTAG
JTAG & serial-wire debug ports
Interrupts
Interrupt latency
Dhrystone
5. Package and Pinout
Both devices are available in LQFP100 package but are not pin-to-pin compatible. Printed circuit
board redesign is needed. SAM3A is also available in LFBGA100 package.
6. Power Considerations
6.1
Power Supply
The table below gives the differences between power supply pins and voltage ranges.
Table 6-1.
Power Supply Comparison
SAM3A
VDDCORE
1.62 to 1.95V
VDDIO
1.62 to 3.6V
VDDIN
1.8 to 3.6V
VDDOUT
1.8V
VDDBU
1.8 to 3.6V
VDDPLL
1.62 to 1.95V
VDDUTMI
3 to 3.6V
VDDANA
2.4 to 3.6V
SAM7A3
VDD1V8
1.65 to 1.95V
VDD3V3
3 to 3.6V
VDD3V3
3 to 3.6V
VDD1V8
1.65 to 1.95V
VDDBU
3 to 3.6V
VDDPLL
1.65 to 1.95V
N/A
VDDANA
3 to 3.6V
Note:
VDDUTMI pin: Powers the UTMI+ interface.
Compared to the SAM7A3, the SAM3A device has no 5V-tolerant I/Os.
3
11179A–ATARM–09-Oct-12
6.2
Typical Power Consumption
Thereafter is the comparison of current consumption of both devices. In the table below, the
Sleep mode of the Cortex-M3 is equivalent to the Idle mode of the ARM7TDMI. The Wait mode
and the Backup mode found in the SAM3A are based on Low-power modes of the Cortex core
(SLEEPING/SLEEPDEEP).
Table 6-2.
Current Consumption Comparison
Mode versus Device
Unit
70
SAM3A
VDDCORE @ 1.8V running from Flash
Memory with CoreMark. Core clock is 60 MHz.
Measurement onto VDDIN.
65
SAM7A3
Core in idle mode. Master @ 500 Hz.
Analog-to-Digital Converter deactivated.
All peripheral clocks deactivated.
USB transceiver disabled.
Measurement onto VDD3V3.
175
SAM3A
Core in sleep mode. Master @ 500 Hz.
Analog-to-Digital Converter deactivated.
All peripheral clocks deactivated.
USB transceiver disabled.
Measurement onto VDDIN.
780
SAM7A3
N/A
SAM3A
Core clock and master clock stopped.
Total current measurement.
VDDBU = 3.3V @25°C.
26.6
µA
SAM7A3
Device only VDDBU powered.
Measurement onto VDDBU.
VDDBU = 3.3V @25°C.
8.2
µA
SAM3A
Supply monitor on VDDUTMI is disabled.
RTT and RTC not used.
Embedded RC oscillator used.
Wake-up pin FWUP = VDDBU.
Current measurement on VDDBU.
VDDBU = 3.3V @25°C.
VDDBU = 3.0V @25°C.
VDDBU = 2.5V @25°C.
VDDBU = 1.8V @25°C.
3.1
2.8
2.3
1.7
µA
Active
Ultra low
power/
sleep
Backup
4
Current
Flash is read. Core clock is 60 MHz.
Analog-to-Digital Converter activated.
All peripheral clocks activated.
USB transceiver enabled.
Measurement onto VDD3V3.
SAM7A3
Wait
Conditions
mA
µA
Application Note
11179A–ATARM–09-Oct-12
Application Note
7. Memories
The table below shows the memory size differences.
Table 7-1.
Memory Size
Memory
SAM7A3
SAM3A8C
SAM3A4C
Flash
256 Kbytes
2 x 256 Kbytes
2 x 128 Kbytes
SRAM
32 Kbytes
64 + 32 Kbytes
32 + 32 Kbytes
8. Low Power Modes
The SAM3A Series feature enhanced low power modes: Backup mode, wait mode and the sleep
mode. Please refer to SAM3A datasheet for more detailed description and current consumption
according to the mode.
9. Peripherals Compatibility
The table below gives peripherals compatibility between a SAM3A and a SAM7A3.
Table 9-1.
Peripherals
Analog to
Digital
Converter
DBGU / UART
Peripheral
DMA
Controller
HSMCI
PIO
Peripherals Compatibility
Features(1)
Enhancement
(See Comments
column)
No
Code rewrite
needed
Same
Full
Same
Full
Enhancement
(See Comments
column)
No
Code rewrite
needed
Enhancement
(See Comments
column)
Yes
Minor code
rewrite needed
Enhancement
(See Comments
column)
PWM
Software
Compatibility
Yes
With common
SAM7A3 features
Comments
- 1 x ADC - 16 ch
- 12-bit resolution at 1 MSPS
- Single ended or differential inputs
- Programmable gain amplifier
- Fault input and input event line to/from
PWM (See “Analog to Digital Converter
(ADC)” )
- MultiMedia card specification version 4.3
- SD memory card specification version 2.0
- SDIO specification version 2.0
- DMA channel
- Programmable pull down
- Additional interrupt modes (rising edge,
falling edge, low level or high level detection)
- Debouncing filter
- Programmable fault inputs
- Write-protect registers
- 2-bit Gray up/down channels for stepper
motor control
- Events lines intended to synchronize ADC
conversions
- Independent complementary outputs with
12-bit dead-time generator.
5
11179A–ATARM–09-Oct-12
Table 9-1.
Peripherals Compatibility
Peripherals
Features
(1)
Enhancement
(See Comments
column)
SPI
USB
Enhancement
(See Comments
column)
Yes
With common
SAM7A3 features
Code rewrite
needed for DMA
support
No
Enhancement
(See Comments
column)
Yes
With common
SAM7A3 features
Code rewrite
needed for DMA
support
Timer Counter
Same
Full
CAN
Same
Full
TWI
Enhancement
(See Comments
column)
Yes
With common
SAM7A3 features
USART
Enhancement
(See Comments
column)
Yes
With common
SAM7A3 features
SSC
Enhancement
(See Comments
column)
Yes
Minor code
rewrite needed
for dual bank
support
Same
Full
Real Time
Timer
Same
Full
Watchdog
Timer
Same
Full
EFC
Real Time
Clock
6
Software
Compatibility
Comments
- 4 x ports - 7 chip selects
- Enhanced chip select
- Underrun management in slave mode
- DMA channel
High speed USB host and device port with
embedded transceiver.
Frame synchro length up to 256bit, I2S 32bit
supported.
2 x CAN port with 8 mailboxes each
Master, multi-master and slave mode
RS485 / ISO7816 / IRDA / SPI / LIN 1.3 and
2.0 master slave / PDC & DMA
Dual bank support
Application Note
11179A–ATARM–09-Oct-12
Application Note
Table 9-1.
Peripherals
Peripherals Compatibility
Features
(1)
Software
Compatibility
Enhancement
(See Comments
column)
Full with SAM7A3
features
Reset
Controller
Same
Full
Flash
Controller
Same
Full
Power
Management
Controller
Note:
Comments
- Trimming of the fast RC oscillator “on-thefly”
- Fast RC oscillator measurement
1. Features of digital IPs, not electrical characteristics.
10. Analog to Digital Converter (ADC)
Compared to the SAM7A3, the ADC of the SAM3A Series is a 12-bit resolution ADC with programmable gain amplifier, single ended or differential inputs among other the 1 MSPS.
For a design guide, please refer to the following document reference [2].
11. General Software Consideration
Atmel has made in-depth article explaining all necessary changes about migrating code from an
ARM7TDMI to a Cortex-M3.
This article in inserted in the Appendix on pages that follow.
12. References
• [1] “An Introduction to the ARM Cortex-M3 Processor” - Shyam Sadasivan October 2006
www.arm.com/files/pdf/IntroToCortex-M3.pdf
• [2] Analog-to-Digital Converter in the SAM3S4
http://www.atmel.com/Images/doc11106.pdf
13. Appendix
7
11179A–ATARM–09-Oct-12
Migrating ARM7 Code to a Cortex-M3 MCU
By Todd Hixon, Atmel
The ARM Cortex-M3 core has enhancements to its architecture that result in increased
code execution speed, lower power consumption, and easier software development. The
result is a true real-time core that overcomes real-time processing limitations of the
ARM7TMI core. Over time, most ARM7-based designs will be migrated to the CortexM3.
Although ARM has done a lot to make it easy to port legacy code from the ARM7 to the
Cortex-M3 core, more remains to be done. The purpose of this two-part article is to take
you step-by-step through the porting process, so you will have no excuses when your
boss asks to you to port some legacy code and have it ready by last Wednesday.
One of the most helpful things ARM has done is to make sure that Cortex-M3 support
has been added to every ARM tool chain, which makes code compilation a
straightforward process that can be done in just a few days in most situations. In fact, the
most important consideration when migrating a legacy ARM7 design to the Cortex-M3 is
selecting a device with peripheral hardware that is identical to that on the ARM7 in the
current design. If the programming interface is different, new peripheral drivers will be
required. This effort could add days or weeks to the schedule (never a good thing).
Using an M3 with identical peripheral hardware enables the software engineer to reuse
most (if not all) of his C language driver code, saving days or even weeks of learning the
nuances of new peripherals often associated with developing a robust driver from point
zero. Vendor-supplied header files handle any relocation of peripheral register addresses the developer simply includes the file for the Cortex-M3 device and recompiles the code.
There are, however, several differences between the Cortex-M3 and the ARM7TDMI
that engineers must address in their designs. Initially in this article, I'll explore the issues
that arise in dealing with exception vector table formatting, startup code/stack
configuration, RAM functions remapping, and hardware interrupt configuration. Then I
will address software interrupts, fault handling, the SWP (setwatchprops) command,
instruction time, assembly language, and optimizations.
The following items are a checklist of what needs to be addressed when porting code
from ARM7TDMI to Cortex-M3:
1. New exception vector table format.
2. New startup code/stack configuration.
3. Remapping RAM functions
4. New hardware interrupt configuration.
5. Software interrupts
6. Fault handlers
7. The SWP command
8. Instruction timing
9. Dealing with any hand-coded assembly language.
10. Optimizations
The Exception Vector Table
The exception vector table is where the application code tells the processor core the
location of software routines to handle various asynchronous events. For ARM cores,
these events include Reset (triggered by a power-up or hard reset), faults and aborts due
to bus errors or undefined instructions and finally interrupts triggered by either software
requests or external sources such as on-chip peripherals.
For an ARM7TDMI, the exception vector table typically consists of at least six1 branch2
instructions in hand-coded assembly:
b
b
b
b
b
b
b
b
Reset_Handler
UndefInstr_Handler
SWI_Handler
PrefetchAbort_Handler
DataAbort_Handler
.
; Reserved vector
IRQ_Handler
FIQ_Handler
1. The FIQ handler can simply begin at offset 0x1c instead of a branch
2. PC-relative LDR instructions can be used instead of branches for long jumps
The exception vector table on the Cortex-M3 can be defined in C as an array of pointers
(see below). The first entry is the address of the stack and the remaining entries are
pointers to various exception handler functions:
#define TOP_STACK
0x20001000
void *vector_table[] = {
TOP_STACK,
System_Init,
NMI_Handler,
HardFault_Handler,
MemManage_Handler,
BusFault_Handler,
UsageFault_Handler,
/*
* remaining handlers (including peripheral interrupts)
*/
};
Processor Modes
The ARM7TDMI has seven processor modes, six of which have their own stack pointer.
One of the seven modes, “User”, operates at a lower privilege level than the others. The
Cortex-M3, on the other hand, has only two modes: “Thread” and “Handler”. Thread
mode can operate at either an elevated privilege level or a user level and can use either
the Main Stack or the Process Stack. Handler mode always operates at privilege level
with the Main Stack.
Elevated privilege levels allow access to the Processor Status Registers (CPSR and SPSR
on the ARM7TDMI; APSR on the Cortex-M3) and possibly restricted memory regions as
dictated by an optional Memory Protection Unit (MPU).
The following table show equivalent levels between the ARM7TDMI and the CortexM3:
ARM7TDMI
Cortex-M3 Equivalent
Exception
Mode
Stack
Privilege
Exception
Mode
Stack
Privilege
-
User
usr
Normal
-
Thread
Process
Normal
-
System
usr
High
-
Thread
Process
High
FIQ
FIQ
fiq
High
-
-
-
-
IRQ
IRQ
irq
High
IRQ
Handler
Main
High
Reset
Supervisor
svc
High
Reset
Thread
Main
High
Software
Interrupt
Supervisor
svc
High
Software
Interrupt
Handler
Main
High
Undefined
Instruction
Undefined
undef
High
MemManage
or Bus Fault
Handler
Main
High
Prefetch or
Data Abort
Abort
abt
High
Usage Fault
Handler
Main
High
Configuring the Processor Mode Stacks
All but the simplest ARM7TDMI systems use at least two processor modes: SVC for
initialization and possibly main loop code and IRQ for interrupts. Each of the used
modes must have their corresponding stack pointers initialized at reset which requires
assembly code:
Reset_Handler:
msr
ldr
msr
ldr
ldr
blx
CPSR_c, #ARM_MODE_IRQ | IRQ_DISABLE | FIQ_DISABLE
sp, =IRQ_STACK_START
; Set IRQ stack pointer
CPSR_c, #ARM_MODE_SVC | IRQ_DISABLE | FIQ_DISABLE
sp, =SVC_STACK_START
; Set SVC stack pointer
r0, =System_Init
r0
; Jump to C routine System_Init
At reset, the Cortex-M3 automatically assigns its Main Stack Pointer (MSP) to the first
entry in the exception vector table (TOP_STACK in the above example) and then jumps
to the routine pointed to by the second entry, System_Init. Since the MSP used by the
IRQ handlers can also be used by the main code, the Cortex-M3 can run many types of
applications without any assemble code to initialize stack pointers.
Nested Interrupts
The management of interrupts from on-chip and off-chip peripherals is an important
feature of microcontrollers with the most important performance metrics being latency
(the time between when the event occurs and when software handles it) and jitter (how
much the latency varies from event to event). Several features of the Cortex-M3 improve
both latency and jitter compared to the ARM7TDMI as well as greatly simplifying the
software needed to handle the interrupts.
Since the ARM7TDMI only has only two general purpose exception inputs, the IRQ and
the FIQ, most SoC vendors include an interrupt controller to multiplex the multitude of
interrupt sources down to a single IRQ or FIQ assertion. The IRQ/FIQ exception handler
then must determine which interrupt source to process and call the appropriate software
routine. Another interrupt cannot be serviced until the routine completes and returns to
the interrupted code, often making latency and jitter unacceptable for certain events. The
obvious solution is to prioritize the events and allow those of higher priority to preempt
those of lower priority. Implementing this on the ARM7TDMI involves an ISR
“wrapper” in assembly code (see example below) that saves the processor status on the
stack, changes the processor mode from IRQ back to SVC, then re-enables the IRQ and
finally calls the event handler. When the handler returns, the saved mode is restored.
Because the handler is called when the processor is in SVC mode, the IRQ can be
asserted again by a higher priority interrupt event.
IRQ_Handler:
sub
stmfd
mrs
stmfd
ldr
ldr
msr
stmfd
lr, lr, #4
; Adjust and save LR_irq on IRQ stack
sp!, {lr}
r14, SPSR
; Save the old processor status and R0 on IRQ stack
sp!, {r0, r14}
r14, =INTERRUPT_CONTROLLER_VECTOR_REGISTER
r0, [r14]
; Get vector to interrupt event handler
CPSR_c, #ARM_MODE_SVC ; Switch to SVC Mode and enable interrupts
sp!, {r1-r3, r12, lr} ; Save registers on SVC stack
mov
bx
lr, pc
r0
ldmia
sp!, {r1-r3, r12, lr} ; Restore registers from SVC stack
; Switch to back IRQ mode and disable interrupts
CPSR_c, #ARM_MODE_IRQ | IRQ_DISABLE
sp!, {r0, r14}
; Restore R0 and SPSR_irq from IRQ stack
SPSR_cxsf, r14
sp!, {pc}^
; Return from interrupt
msr
ldmia
msr
ldmia
; Branch to the source handler
With the Cortex-M3, this wrapper code is no longer required because of the inclusion of
the Nested Vectored Interrupt Controller (NVIC). The NVIC is similar to the interrupt
controllers that SoC vendors typically include with an ARM7TDMI device however
since it is integrated with the processor core, the NVIC can perform more sophisticated
actions. The ARM7TDMI ISR wrapper code is now essentially done in hardware!
When an interrupt occurs that is a higher priority than is currently executing, the NVIC
will automatically save the registers required for a call to a function compliant with the
ARM Architecture Procedure Call Standard (AAPCS) and restore the registers when the
function completes. Chances are the C compiler uses AAPCS which means that the
NVIC can call C functions directly. Therefore, the vector_table[] array previously
listed can contain pointers to C functions.
Configuring Interrupts
The NVIC determines which exception to handle based on the source‟s priority
designation. The Reset, NMI and Hard Fault exceptions are fixed at the first, second and
third highest priorities. The remaining exception sources have user configurable priority
levels specified by their Preempt Priority and Subpriority. If an exception occurs with a
higher Preempt Priority than what is currently executing, the handler for the new
exception will be called. Otherwise, the new exception will be pended until all higher
priority exceptions have completed. If multiple exceptions are pended within a particular
Preempt Priority level, the NVIC will handle them in order of their Subpriority. Sources
with the same Subpriority level will be handled in the order of their NVIC source
number.
Note that with the Preempt Priority and Subpriority fields, smaller numbers represent
higher priority with „0‟ being the highest.
This example IRQ priority configuration will have the effects listed below:
Preempt
Priority
Subpriority
1
2
2
0
0
1
IRQ0
IRQ1
IRQ2
* IRQ0 can preempt IRQ1 or IRQ2.
* IRQ1 and IRQ2 will be pended if they occur while IRQ0 is executing.
* If both IRQ1 and IRQ2 were pended, IRQ1 will execute before IRQ2 when IRQ0
returns because IRQ1 has a higher Subpriority than IRQ2.
* If IRQ1 occurs while IRQ2 is executing, IRQ1 will not preempt IRQ2. Instead, IRQ1
will be pended and execute when IRQ2 completes.
The table below shows how the bits of the Priority Level is split between the Preempt
Priority (“Pre” column) and the Subpriority (“Sub” column) based on the particular SoC
Priority Level Register size (columns ranging 3 to 8) and the Priority Group setting (rows
ranging 0 to 7).
Priority
Level
Register
Size:
3
4
5
6
7
8
Priority
Group
Setting
Pre
Sub
Pre
Sub
Pre
Sub
Pre
Sub
Pre
Sub
Pre
Sub
0
1
2
3
4
5
6
7
[7:5]
[7:5]
[7:5]
[7:5]
[7:5]
[7:6]
[7]
-
[5]
[6:5]
[7:5]
[7:4]
[7:4]
[7:4]
[7:4]
[7:5]
[7:6]
[7]
-
[4]
[5:4]
[6:4]
[7:4]
[7:3]
[7:3]
[7:3]
[7:4]
[7:5]
[7:6]
[7]
-
[3]
[4:3]
[5:3]
[6:3]
[7:3]
[7:2]
[7:2]
[7:3]
[7:4]
[7:5]
[7:6]
[7]
-
[2]
[3:2]
[4:2]
[5:2]
[6:2]
[7:2]
[7:1]
[7:2]
[7:3]
[7:4]
[7:5]
[7:6]
[7]
-
[1]
[2:1]
[3:1]
[4:1]
[5:1]
[6:1]
[7:1]
[7:1]
[7:2]
[7:3]
[7:4]
[7:5]
[7:6]
[7]
-
[0]
[1:0]
[2:0]
[3:0]
[4:0]
[5:0]
[6:0]
[7:0]
For example, if the Priority Level Register size is four bits, a Priority Group setting of 4
will cause bits [7:5] to be used as the Preempt Priority level and bit [4] to be used as the
Subpriority. The remaining bits [3:0] are unused. In this case, an exception with priority
0x20 will preempt one with 0x40 (lower value is higher priority). If exceptions with
priorities 0x40 and 0x50 occur while exception 0x20 is being serviced, they will be
pended (as described earlier) and the 0x40 exception will run before the 0x50 exception
because the former has a higher Subpriority (bit 4 is 0 in 0x40 and is 1 in 0x50; 0 is
higher priority than 1).
If it isn‟t exactly clear how the levels should be partitioned for a particular application, a
reasonable starting point is to select Priority Group „0‟ which will make all of the levels
preemptive (an 8-bit group register will only have 128 preemptive levels).
The following code sets the Priority Group in the NVIC:
#define NVIC_AIRCR
(*((unsigned int *)0xE000ED0C))
NVIC_AIRCR = (0x05fa << 16) |
(0 << 8);
/* Access key */
/* Priority Group 0 */
All of the system exceptions except for the first three (Reset, NMI and Hard Fault) have
user configurable priority levels.
#define CM3_SHPR
((unsigned char *)0xE000ED18)
NVIC_ICPR[num - 3] = priority;
Peripheral interrupts are configured based on their IRQ number (IRQn):
#define
#define
#define
#define
NVIC_IPR
NVIC_ISER
NVIC_ICER
NVIC_ICPR
((unsigned
((unsigned
((unsigned
((unsigned
char *)0xE000E400)
int *)0xE000E100)
int *)0xE000E180)
int *)0xE000E280)
/* Disable IRQn */
NVIC_ICER[IRQn >> 5] = 1 << (IRQn & 0x1f);
/* Set IRQn priority */
NVIC_IPR[IRQn] = priority;
/* Clear IRQn pending */
NVIC_ICPR[IRQn >> 5] = 1 << (IRQn & 0x1f);
/* Enable IRQn */
NVIC_ISER[IRQn >> 5] = 1 << (IRQn & 0x1f);
RAM Remap Function
ARM7TDMI SoC vendors commonly provide a mechanism in their memory controllers
to select whether a non-volatile memory or a volatile memory appears at the reset vector
(typically address 0x0). This allows an application to start with a fixed set of vectors in
non-volatile memory and then switch to a different set of vectors later by setting up a new
vector table in RAM and “remapping” the RAM to address 0x0.
With the Cortex-M3, this memory controller sleight-of-hand is no longer necessary as the
NVIC allows the exception vector table to be located practically anywhere in memory.
Using a new vector table is as simple as writing its offset from the beginning of its
address space (either “code” or “SRAM”) into the NVIC Vector Table Offset register.
void *new_vector_table[] = {
0,
new_System_Init,
new_NMI_Handler,
new_HardFault_Handler,
new_MemManage_Handler,
new_BusFault_Handler,
new_UsageFault_Handler,
/*
* remaining handlers (including peripheral interrupts)
*/
};
#define SCB_VTOR
(*((unsigned int *)0xE000ED08))
unsigned int pv = (unsigned int)new_vector_table;
if (pv < 0x20000000) {
/* vector table is in ‘code’ region */
SCB_VTOR = pv;
}
else {
/* vector table is in ‘SRAM’ region –
use the offset from the beginning of SRAM
and set bit 29 indicating that the table is in SRAM */
SCB_VTOR = (pv – 0x20000000) | (1 << 29);
}
One requirement for the vector table is that it be aligned on the total number of entries
rounded up to the next power of two words. For example, if the SoC vendor used 30 IRQ
sources, the total number of entries including the 16 system entries would be 46. The
next power of two up from 46 is 64 and the alignment for 64 4-byte words would be on a
256 byte boundary. The alignment of the vector table array can usually be specified with
a toolchain-specific directive.
The FIQ
The Cortex-M3 has no direct equivalent to the ARM7TDMI FIQ interrupt. While the
Cortex-M3‟s Non-Maskable Interrupt (NMI) may seem like a tempting substitute for the
FIQ, the NMI is missing the key feature of the FIQ: the ability to preload data into
shadowed registers. As the Cortex-M3 doesn‟t shadow general purpose registers, the
most suitable FIQ replacement is just a normal IRQ with a higher priority assignment, if
needed.
Executing From RAM
A common method of increasing the execution speed of critical code on ARM7TDMI
devices is to execute it from internal SRAM (typically copied automatically there when
tagged with a toolchain-specific “ramfunc” directive), taking advantage of the SRAM‟s
faster access time (often zero wait-states vs. one or more wait-states with flash memory).
The Cortex-M3 was designed for highest performance when executing from “code”
memory (commonly internal flash) and execution from internal SRAM will be much
slower since a single bus (the System bus) will be used for both instructions and data.
The highest performance is realized when instructions are in the “code” memory region
allowing the Cortex-M3 to perform simultaneous code and data accesses with use
separate busses.
Similarly, the exception vector table should also be located in the “code” memory area.
This allows the registers to be stacked to RAM at the same time that the exception vector
is read.
Dealing with Hand-coded Assembly
In the 32-bit processor world, hand-coded assembly is often only used for operations that
a high-level language either cannot perform directly (e.g., manipulating processorspecific registers) or is too slow. The Cortex-M3 has eliminated the need for much of the
former and what remains (discussed below) can usually be encoded as inline assembly.
Code written in ARM-mode assembly for performance reasons will require either a
rewrite into a high-level language or a manual translation into the Cortex-M3
Thumb/Thumb-2 instruction set. Using a high-level language is obviously easier to
maintain and in many instances the compiler generates code as good as that generated by
hand. However, if hand-coded assembly happens to be preferred, many ARM-mode
instructions have Thumb-2 equivalents.
One ARM-mode feature often utilized is the conditional execution of an instruction to
avoid the penalty of branching around it. The Cortex-M3 provides a similar capability
with the “IT” (if-then) instruction which will conditionally execute the following one to
four instructions based on whether a comparison is true or false. (In situations with more
than four instructions conditional on a single comparison, an actual branch instruction
must be used.) As an example, the following ARM assembly will clear eight bytes to the
address in either register R2 or R3 depending on R1 being equal to 0 or not:
cmp
streq
streq
strne
strne
r1,
r0,
r0,
r0,
r0,
#0
[r2,
[r2,
[r3,
[r3,
#0]
#4]
#0]
#4]
;
;
;
;
if
if
if
if
r1
r1
r1
r1
is
is
is
is
0, write to first 4 bytes in r2
0, write to second 4 bytes in r2
not 0, write to first 4 bytes in r3
not 0, write to second 4 bytes in r3
;
;
;
;
if
if
if
if
r1
r1
r1
r1
is
is
is
is
0, write to first 4 bytes in r2
0, write to second 4 bytes in r2
not 0, write to first 4 bytes in r3
not 0, write to second 4 bytes in r3
The equivalent code on the Cortex-M3:
cmp
ittee
streq
streq
strne
strne
r1,
eq
r0,
r0,
r0,
r0,
#0
[r2,
[r2,
[r3,
[r3,
#0]
#4]
#0]
#4]
This form of the IT instruction, ittee eq, causes the two instructions following it to
execute only if the „eq‟ condition is true and the two instructions after that to execute
only if the „eq‟ condition is false. The Cortex-M3 Technical Reference Manual has more
details on the usage of the IT instruction.
Some instructions can be encoded as either 16-bit Thumb or 32-bit Thumb-2. The
encoding used can be selected by adding a suffix to the instruction: “.n” for 16-bit
Thumb (narrow) or “.w” for 32-bit Thumb-2 (wide). If unspecified, the assembler will
typically encode for 16-bit Thumb.
Disabling Interrupts
Occasionally, an application may need to temporarily disable all processor interrupts. On
the ARM7TDMI, the vendor-supplied interrupt controller may provide a global disable
register or the application may set the Current Processor Status Register „I‟ bit (and
perhaps the „F‟ bit) with the following assembly code:
disable_irq:
mrs
r0, CPSR_c
orr
r0, r0, #0x80
msr
CPSR_c, r0
; read the CPSR
; set the I bit
; write the modified CPSR
enable_irq:
mrs
r0, CPSR_c
bic
r0, r0, #0x80
msr
CPSR_c, r0
; read the CPSR
; clear the I bit
; write the modified CPSR
On the Cortex-M3, a special PRIMASK register disables all interrupts except the NMI
and fault exceptions:
disable_irq:
mov
r0, #1
msr
PRIMASK, r0
enable_irq:
mov
r0, #0
msr
PRIMASK, r0
Software Interrupts
The ARM7TDMI allows software to generate an exception via the SWI instruction. This
exception is typically used as an interface to system drivers or other privileged code that
cannot be called directly. An example usage of the ARM-mode version of the SWI
instruction is shown below:
swi
0x123456
The ARM7TDMI responds by setting R14 to the instruction after the SWI, disabling
IRQ, changing to SVC mode and jumping to the SWI exception table vector. ARMmode SWI requests can be processed with this basic exception handler:
swi_exception_handler:
stmfd sp!, {r10}
ldr
r10, {r14, #-4}
; get the SWI instruction
bic
r10, r10, #ff000000 ; get the SWI operand from the instruction
; Code to handle event based on operand in r10
ldmia sp!, {r10, pc}^
; return from handler
The Cortex-M3 has a similar mechanism using the SVC instruction however the handler
is different because the NVIC automatically stacks R0-R3, R12, LR, PC and PSR. The
handler must first determine whether the Main or Process stack was used in order to
access the SVC operand and any other parameters that might be passed.
svc_exception_handler:
tst
lr, #4
ite
eq
mrseq r0, MSP
mrsne r0, PSP
ldr
r1, [r0, #24]
; stacked PC
ldrb r1, [r1, #-2]
; get the operand from the SVC instruction
; Code to handle SVC based on operand in r1
bx
lr
; return from handler
Another difference in the two software interrupt implementations is that while an SWI
exception handler is allowed to invoke another SWI exception, the Cortex-M3 NVIC
cannot respond to an exception with the same priority as what is currently executing
(attempting to do so will trigger a usage fault).
Fault Handlers
Both the ARM7TDMI and Cortex-M3 have special fault exceptions that it will trigger if a
problem is encountered during a memory access or while processing an instruction.
These faults usually indicate that either hardware or software has failed and since
recovery is unlikely, a typical fault handler will simply halt after logging the state of the
processor so that the problem can be addressed later.
An important item to log is the address of the instruction that was being executed when
the fault occurred. This along with a disassembly of the code, the contents of the
processor registers and possibly a portion of the stack frame is often enough information
to pinpoint what went wrong. On the ARM7TDMI, the executing instruction can be
found by subtracting four from the link register (LR) for Undefined Instruction and
Prefetch Abort exceptions and by subtracting eight from the LR for Data Aborts. With
the Cortex-M3, the program counter at the time of the fault is pushed onto the stack as
with most exception events and can be extracted with the following code:
tst
ite
mrseq
mrsne
lr, #4
eq
r0, MSP
r0, PSP
; check that r0 is a valid stack
; pointer to avoid a second fault
tst
bne
ldr
cmp
bmi
ldr
cmp
bpl
ldr
skip:
r0, #3
skip
r1, =STACK_ADDRESS_MIN
r0, r1
skip
r1, =STACK_ADDRESS_MAX - 32
r0, r1
skip
r1, [r0, #24]
; r1 <= stacked PC
The Cortex-M3 has the following fault exceptions:
1. Usage fault - for undefined instructions or certain unaligned accesses.
2. Memory management fault - for attempts to access unprivileged memory.
3. Bus fault - for accessing invalid or offline memory regions.
4. Hard fault - for when the above fault exceptions cannot run.
The hard fault exception has a fixed priority level (higher than any user configurable
level) and is always enabled. The other fault exceptions have a user configurable priority
level and must be enabled before being used. If a fault event occurs for a disabled fault
handler or if the handler has too low of a priority to run, a hard fault will be triggered.
For many basic systems, only a hard fault handler is necessary to catch software errors.
The Cortex-M3 has several registers to help diagnose fault conditions:
1. Usage Fault Status Register (UFSR)
2. MemManage Fault Status Register (MMSR)
3. Bus Fault Status Register (BFSR)
4. MemManage Fault Address Register (MMAR)
5. Bus Fault Address Register (BFAR)
The three status registers (UFSR, MMSR and BFSR) can all be read as a single 32-bit
word called the Combined Fault Status Register (CFSR). The MMAR and BFAR
registers contain the address that caused their respective faults if the MMARVALID or
BFARVALID bit is set in MMSR or BFSR. The following code reads the CFSR and the
appropriate Fault Address Register:
ldr
ldr
tst
it
ldrne
tst
it
ldrne
r0,
r1,
r1,
ne
r2,
r1,
ne
r2,
=0xE000ED28
[r0, #0]
; r1 <= CFSR
#0x80
; MMARVALID set?
[r0, #12]
#0x8000
; r2 <= MMAR
; BFARVALID set?
[r0, #16]
; r2 <= BFAR
Some bus faults may not occur until several instructions have executed after the faulting
instruction (for example, an STR instruction that uses the write buffer). This case will be
indicated by the IMPRECISERR bit in BFSR.
The SVC instruction cannot be used in a hard fault handler. Since the SVC exception is
always a lower priority than the hard fault handler, attempts to trigger it will result in a
second hard fault. The Cortex-M3 responds to a double hard fault by entering a “locked”
state where only a Reset, NMI or intervention with a debugger can resume execution.
The SWP Instruction
The ARM7TDMI included a SWP instruction that provided an atomic read-then-write to
a memory location. A common use of SWP is in the implementation of operating system
semaphores to provide mutual exclusion between tasks.
take_semaphore:
ldr
r0, =semaphore_addr
mov
r1, #1
swp
r2, r1, [r0]
; Set the semaphore to 1
cmp
r2, #1
; Was it already set by another task?
beq
take_semaphore
; Yes, try again
give_semaphore:
ldr
r0, =semaphore_addr
mov
r1, #0
str
r1, [r0]
; Write a 0 to semaphore to give it back
The Cortex-M3 does not have the SWP instruction although the semaphore functionality
can be implemented with the load exclusive (LDREX) and store exclusive (STREX)
instructions.
take_semaphore:
ldr
r0, =semaphore_addr
ldrex r1, [r0]
cbnz r1, take_semaphore
mov
r1, #1
strex r2, r1, [r0]
cbnz r2, take_semaphore
give_semaphore:
ldr
r0, =semaphore_addr
mov
r1, #0
str
r1, [r0]
; Another task has the semaphore
; Try again
; Try setting the semaphore to 1
; Another task set the semaphore
; Try again
; Clear the semaphore to 1
Instruction Timing
The Cortex-M3 will pipeline LDR and STR instructions when possible allowing
subsequent instructions to begin executing before the previous one completes. This
behavior is normally desirable as it increases overall execution speed, however it can also
potentially cause any assembly code tuned for a precise timing to be off.
For example, take the case of creating a pulse on a PIO pin by writing to the SoC
peripheral registers associated with setting a pin high and low:
ldr
ldr
mov
str
str
r0,
r1,
r2,
r2,
r2,
=pio_set_reg
=pio_clear_reg
#1
[r0]
; set pin high
[r1]
; set pin low
On the Cortex-M3, this code creates a pulse whose width is two system clocks long (the
execution time of the second STR instruction). A reasonable assumption would be that
the addition of a NOP instruction between the two STR instructions would make the
pulse one clock period longer but the pulse remains only two clocks wide because the
NOP is actually executed during the second cycle of the STR instruction. Adding a
second NOP instruction will lengthen the pin pulse by one clock period.
Optimizations
After the initial port is complete and the application is functioning, it makes sense to
investigate the new features that the Cortex-M3 has to offer and how they might help
increase the performance of the application.
The Bit Band
Past ARM instruction sets only provide accesses to memory in units of bytes (8-bit), halfwords (16-bit) or words (32-bit). Modifying individual bits in memory requires three
steps:
1. Reading unit of memory into a general register,
2. Perform logical operations on the register to manipulate the desired bits, and
3. Writing the unit of memory back out.
One major drawback to this method is that it is not atomic. If the thread is interrupted by
an ISR that writes to the same memory unit, the memory will be corrupted by the
resumed thread. To make this operation atomic, interrupts would need to be disabled
before the memory read and then re-enabled after the memory write. That‟s at least five
operations to atomically set or clear a single bit.
The Cortex-M3 provides a mechanism to modify individual bits in memory in a simple
and atomic way. Basically, a single word access within a special 32MB portion of the
SRAM and Peripheral address regions is handled as an individual bit access to a word in
the first 1MB of the region. For example, writing a „1‟ to address 0x22000000 will set
bit 0 of the word at 0x20000000.
static unsigned int x;
unsigned int *p = (unsigned int *)(((unsigned int)&x & 0xf0000000) +
0x02000000 + (((unsigned int)&x & 0x000ffffc) * 32));
x = 0;
p[0] = 1;
/* set bit 0 in x */
p[1] = 1;
/* set bit 1 in x */
p[31] = 1;
/* set bit 31 in x */
if (p[0]) p[30] = 1;
/* set bit 30 in x because bit 0 is set */
/* x now equals 0xc0000003 */
About the author
After receiving a BSEE from the University of Texas in 1992, Todd Hixon has spent most of his
career developing hardware and software for various microcontroller-based products, eventually
specializing in network device drivers for a major DSL modem manufacturer. He now works for
Atmel where he provides specialized software solutions for Atmel's AT91 family of ARM
microcontrollers.
Revision History
Doc. Rev
Comments
11179A
First issue
8
Change
Request Ref.
Application Note
11179A–ATARM–09-Oct-12
Headquarters
International
Atmel Corporation
2325 Orchard Parkway
San Jose, CA 95131
USA
Tel: 1(408) 441-0311
Fax: 1(408) 487-2600
Atmel Asia
Unit 1-5 & 16, 19/F
BEA Tower, Millennium City 5
418 Kwun Tong Road
Kwun Tong, Kowloon
Hong Kong
Tel: (852) 2245-6100
Fax: (852) 2722-1369
Atmel Munich GmbH
Business Campus
Parkring 4
D-85748 Garching b. Munich
GERMANY
Tel: (+49) 89-31970-0
Fax: (+49) 89-3194621
Atmel Japan
9F, Tonetsu Shinkawa Bldg.
1-24-8 Shinkawa
Chuo-ku, Tokyo 104-0033
Japan
Tel: (81) 3-3523-3551
Fax: (81) 3-3523-7581
Technical Support
Sales Contacts
www.atmel.com/contacts/
Product Contact
Web Site
Literature Requests
www.atmel.com/literature
Disclaimer: The information in this document is provided in connection with Atmel products. No license, express or implied, by estoppel or otherwise, to any
intellectual property right is granted by this document or in connection with the sale of Atmel products. EXCEPT AS SET FORTH IN ATMEL’S TERMS AND CONDITIONS OF SALE LOCATED ON ATMEL’S WEB SITE, ATMEL ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY
WARRANTY RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, OR NON-INFRINGEMENT. IN NO EVENT SHALL ATMEL BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDENTAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT
OF THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF ATMEL HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Atmel makes no
representations or warranties with respect to the accuracy or completeness of the contents of this document and reserves the right to make changes to specifications
and product descriptions at any time without notice. Atmel does not make any commitment to update the information contained herein. Unless specifically provided
otherwise, Atmel products are not suitable for, and shall not be used in, automotive applications. Atmel’s products are not intended, authorized, or warranted for use
as components in applications intended to support or sustain life.
© 2012 Atmel Corporation. All rights reserved. Atmel®, Atmel logo and combinations thereof, and others, are registered trademarks or trademarks of Atmel Corporation or its subsidiaries. ARM ®, Thumb ®, ARM7TDMI ®, Cortex™ are registered trademarks or trademarks of ARM Limited.
Other terms and product names may be the trademarks of others.
11179A–ATARM–09-Oct-12