SILABS AN720

AN720
P RECISION 32™ O P T I MI Z A T I O N C ONSIDERATIONS FOR
C ODE S I Z E AND S PEED
1. Introduction
The code size and execution speed of a 32-bit MCU project can vary greatly depending on the way the code is
written, the toolchain libraries used, and the compiler and linker options. This document addresses how to
determine what portions of code are taking extra space or time and ways to optimize for space or speed for
different tool chains, including GCC redlib and newlib (Precision32 IDE) and Keil.
2. Key Points
The key topics of this document are:
How
to determine what portions of the project are taking the most space
Ways to benchmark code execution speed
Common strategies to reduce code size or improve execution speed
Code startup time and ways to reduce it
3. Using CoreMark™ as a Speed Benchmark
CoreMark is a standard code base that can be ported to various processors to provide a speed benchmark. The
CoreMark software provides a score that rates how fast the core and code is, providing a relative comparison
between various toolchain options and settings. The CoreMark software package cannot be modified except for
device-specific information in the portme files. For modes that do not support printf (nohosting libraries), the
results were calculated using the value of the variable in code. See the CoreMark website for more information on
the test and score reporting requirements (www.coremark.org).
4. Non-Toolchain Considerations
The coding style and technique can have a great effect on the overall size of the project.
4.1. Coding Techniques
There are many ways coding technique can affect code size, including library calls, inline code or data, or code
optimizations made for global variables or pointers.
For more information on writing C code for ARM architectures, see the following resources:
EETimes
- Energy efficient C code for ARM devices by Chris Shore: http://www.eetimes.com/design/
embedded/4210470/Efficient-C-Code-for-ARM-Devices
Compiler Coding Practices - ARM: http://infocenter.arm.com/help/index.jsp?topic=/
com.arm.doc.dui0472c/CJAFJCFG.html
These guidelines will largely apply regardless of the compiler used for the project.
4.2. Number of Function Parameters
Functions with either Keil or GCC can have as many parameters as desired. In general, the first four parameters
are passed to the function efficiently using registers. Any additional parameters beyond four must be moved on or
off the stack, which results in extra code size for each additional parameter and extra time to execute those
instructions. If possible, keeping functions to no more than four parameters can help reduce code size and
execution time.
Rev. 0.1 9/12
Copyright © 2012 by Silicon Laboratories
AN720
AN720
4.3. Alignment
In most cases, Cortex-M3 linkers place code in memory efficiently. In some projects, however, the alignment of
functions and code can be carefully managed manually to reduce code size or change code execution speed. For
example, if two functions in the same file call each other, but one ends up in flash and one ends up in RAM, the
compiler may need to place extra code to perform a long jump and take longer to execute that jump. If needed,
functions and variables can be explicitly located using scatterfiles and linker flags. More information on linker
scripts and scatterfiles can be found on the Code Red (http://support.code-red-tech.com/CodeRedWiki/
OwnLinkScripts) and ARM websites (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.kui0101a/
armlink_babddhbf.htm).
4.4. RAM Size
The RAM size of a project can be just as important as the code size. In particular, the default configurations for
SiM3xxxx projects place the stack at the top of memory growing down and the heap at the end of program data
growing up. If too much of the RAM is used by program data, then the stack and heap may collide, leading to
difficult debugging issues in run-time. Projects should always leave enough RAM space to accommodate the
function-calling depth of the code.
4.5. SiM3xxxx Core and Flash Access Speed
At the maximum device AHB speed, an SiM3xxxx device reading flash every pipeline cycle may violate the
maximum flash access speed. To compensate for this, the FLASHCTRL module has controls to reduce the flash
access speed (SPMD and RDSEN). Depending on the code density and make-up (i.e., 16-bit or 32-bit
instructions), this may lead to stalls in the core before the next instructions can be fetched from flash. Executing at
high speeds with strings of 16-bit instructions may yield the fastest core operation.
4.6. SiM3xxxx Core and the Direct Memory Access (DMA) Module
On SiM3xxxx devices, the core and the DMA can access multiple AHB slaves at the same time without any
performance degradation. If the core and DMA access the same AHB slave at the same time (i.e., RAM), then the
AHB has priority-based arbitration in the following precedence:
1. Core data fetch
2. DMA
3. Core instruction fetch
If multiple DMA channels are active at the same time and accessing the same memory areas as the core, this
could lead to a reduction in core execution speed.
2
Rev. 0.1
AN720
5. Precision32 IDE (redlib and newlib)
This section discusses ways to optimize projects using the Precision32 IDE and both redlib and newlib libraries.
The Precision32 GCC tools used for the code size and execution speed testing discussed in this document are
ARM/embedded-4_6-branch revision 182083 (http://gcc.gnu.org/svn/gcc/branches/ARM/embedded-4_6-branch/)
with newlib v1.19 and Redlib v2 (Precision32 IDE v4.2.1 [Build 73]).
5.1. Reading the Map File
The first step in the code size optimization process is to analyze the project map file and determine what portions of
code take the most space.
The map file is an output of the linker that shows the size of each function and variable and their positions in
memory. This map file is located in the build files for a project.
In addition to the functions, the map file includes information on variables and other symbols, including unused
functions that are removed.
For a Precision32 IDE Debug build, the map file is located in the project’s Debug directory. Figure 1 shows an
excerpt of the sim3u1xx_Blinky redlib Debug example map file.
For each function in the project, the map file lists the starting address and the length. For example, the
my_rtc_alarm0_handler function starts at address 0x0000_04D4 and occupies 0x70 bytes of memory.
Figure 1. sim3u1xx_Blinky Precision32 Debug Map File Example
5.2. Determining a Project’s Code Size
Each project’s library and function usage is different. Analyzing the project’s makeup can help determine the most
effective way to reduce code space.
All Precision32 SDK projects automatically output the code and RAM size after a build. To modify this output in the
Precision32 IDE:
1. Right-click on the project_name in the Project Explorer view.
2. Select Properties.
3. In the C/C++ BuildSettingsBuild Steps tab, remove or add the following in the Post-build
stepsCommand box: arm-none-eabi-size "${BuildArtifactFileName}"
After building the si32HAL 1.0.1 sim3u1xx_Blinky example, the IDE outputs:
text
data
bss
dec
hex
13312
4
344
13660
355c
Rev. 0.1
3
AN720
The areas of memory are:
text:
code and read-only memory in decimal
read-write data in decimal
bss: zero-initialized data in decimal
dec: total of text, data, and bss in decimal
hex: total of text, data, and bss in hex
More information about the size tool can be found on the Code Red website (http://support.code-red-tech.com/
CodeRedWiki/FlashRamSize).
data:
Figure 2. Automatically Reporting Project Size on Project Build in Precision32
4
Rev. 0.1
AN720
5.3. Toolchain Library Usage
Some toolchains have multiple libraries or settings that can change the size or execution speed of code. The
Precision32 tools have six options:
newlib
(standard GCC) with no standard I/O
newlib (standard GCC) with nohosting standard I/O
newlib (standard GCC) with semihosting standard I/O
redlib (GCC) with no standard I/O
redlib (GCC) with nohosting standard I/O
redlib (GCC) with semihosting standard I/O
The semihosting libraries have additional hooks to enable a project to send debugging information to an IDE
running on a PC. The nohosting libraries have this additional capability removed. The none versions of the
toolchains have no standard I/O capability (i.e., no printf()).
For some example projects (like si32HAL 1.0.1 sim3u1xx_Blinky), the compile-time library can be modified by
opening the myLinkerOptions_p32.ld file in the project directory and changing the uncommented line.
Figure 3. Using the myLinkerOptions_p32.ld File to Select the Project Library
The four lines in the file correspond to a library:
GROUP(libgcc.a
libc.a libm.a libcr_newlib_nohost.a) (line 4): newlib nohosting
libc.a libm.a libcr_newlib_semihost.a) (line 5): newlib semihosting
GROUP(libcr_semihost.a libcr_c.a libcr_eabihelpers.a) (line 6): redlib semihosting
GROUP(libcr_nohost.a libcr_c.a libcr_eabihelpers.a) (line 7): redlib nohosting
The none libraries do not have corresponding entries in this file. Add these lines to add support for none:
GROUP(libgca.a
GROUP(libgcc.a
libc.a libm.a): newlib none
GROUP(libcr_c.a libcr_eabihelpers.a): redlib none
After setting the myLinkerOptions_P32.ld file to the correct setting, set the IDE to the same library using these
steps:
1. Left-click on the project_name in the Project Explorer view.
2. Select Properties.
3. Click on C/C++ BuildSettingsTool Settings tabMCU LinkerTarget and select the desired
library from the Use C library drop-down menu. Figure 4 shows this dialog in the Precision32 IDE.
4. Clean and Build the project.
AppBuilder projects do not have a myLinkerOptions_P32.ld file and can use the Quickstart view setting only.
Rev. 0.1
5
AN720
Figure 4. Using the Precision32 IDE to Select the Project Library
Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package,
Table 1 and Table 2 show the relative Debug build sizes with the different toolchain library options. Table 3 shows
the Debug build sizes for CoreMark, and Table 4 shows the relative CoreMark speed scores for each of these
library options.
For the newlib and redlib none libraries, see “5.4. Function Library Usage”.
Table 1. Precision32 Toolchain Library Usage Comparison—sim3u1xx_Blinky Debug
Library
newlib semihosting
newlib nohosting
newlib none
redlib semihosting
redlib nohosting
redlib none
6
Code (bytes)
Read Only Data
(bytes)
35564
34864
13080
13136
Read-Write Data
(bytes)
2248
2248
N/A (requires printf() removal)
4
4
N/A (requires printf() removal)
Rev. 0.1
Zero-Initialized
Data (bytes)
124
68
344
344
AN720
Table 2. Precision32 Toolchain Library Usage Comparison—demo_si32UsbAudio Debug
Library
newlib semihosting
newlib nohosting
newlib none
redlib semihosting
redlib nohosting
redlib none
Code (bytes)
Read Only Data
(bytes)
108844
108144
76176
76120
Read-Write Data
(bytes)
6944
6944
N/A (requires printf() removal)
4704
4704
N/A (requires printf() removal)
Zero-Initialized
Data (bytes)
11904
11848
12124
12124
Table 3. Precision32 Toolchain Library Usage Comparison—CoreMark Debug Size
Library
newlib semihosting
newlib nohosting
newlib none
redlib semihosting
redlib nohosting
redlib none
Code (bytes)
Read Only Data
(bytes)
46900
46208
24400
24344
Read-Write Data
(bytes)
2352
2352
N/A (requires printf() removal)
112
112
N/A (requires printf() removal)
Zero-Initialized
Data (bytes)
2140
2084
2360
2360
Table 4. Precision32 Toolchain Library Usage Comparison—CoreMark Debug Speed
Library
CoreMark Score
newlib semihosting
CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
N/A (requires printf() removal)
CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
N/A (requires printf() removal)
newlib nohosting
newlib none
redlib semihosting
redlib nohosting
redlib none
Rev. 0.1
7
AN720
5.4. Function Library Usage
Function libraries such as floating point math and printf() can significantly increase the size of a project. If a project
is constrained by size, a careful analysis of the usage of these large libraries may be required. For example,
floating point can often be approximated well by fixed point math, eliminating the need for the floating point
libraries.
The printf() library is often needed by projects for debugging or release code. If printf() is used for debugging
purposes, using a defined symbol in the project to remove printf() when compiling a release build can dramatically
reduce the size of a project. To define a symbol to differentiate between a Debug project and a Release project,
see “ Contact Information”. The code can then use #ifdef...#endif preprocessor statements to remove debugging
code or printf() calls.
The removal of debugging printf() statements can dramatically reduce the code size of a project. A simple way to
do this is to redefine the printf function at the top of the file containing the printf() calls using the following
statement:
#define printf(args...)
For si32Library examples such as demo_si32UsbAudio, define the statement at the top of myBuildOptions.h to
remove all calls to printf() with higher optimization settings. Additionally, reduce the code size footprint by disabling
logging in myBuildOptions.h:
#define si32BuildOption_enable_logging 0
This method preserves the printf() statements for later use, if needed. The printf() define can also be
encapsulated with preprocessor #if statements to automatically include this define when building with a Release
configuration.
When removing printf() for use with newlib none or redlib none, all references to printf() and stdio.h must be
commented out of the project. The none libraries cannot be used with si32Library projects.
To verify that all instances of printf() have been removed, search the map file for the project for the printf library. In
the sim3u1xx_Blinky example, this means adding the statement to both the main.c and gCpu.c files.
Instead of using standard printf(), which can have a high library cost, use integer-only print functions like iprintf()
for newlib projects. For redlib projects in the Precision32 IDE, create a define CR_INTEGER_PRINTF in the project
properties to force an integer-only version of printf(). For instances of printf() with a fixed-string, using puts() can
dramatically reduce code size.
More information about redlib and printf() can be found on the Code Red website: http://support.code-redtech.com/CodeRedWiki/UsingPrintf.
If a project does not use any standard I/O functions, use the redlib or newlib none toolchain option to reduce code
size as discussed in “6.3. Toolchain Library Usage”.
Using the sim3u1xx_Blinky default example in the si32HAL 1.0.1 software package, Table 5 shows the relative
build sizes with the different printf() settings. The demo_si32UsbAudio comparison is not included since printf()
removal requires higher optimization settings or code modifications. This section also does not include the
CoreMark tests since printf is not part of the CoreMark benchmark.
8
Rev. 0.1
AN720
Table 5. Precision32 printf() Comparison—sim3u1xx_Blinky Debug
Library
newlib semihosting with printf
newlib nohosting with printf
newlib nohosting with integer
printf (iprintf)
newlib nohosting with puts
instead of printf
newlib nohosting without printf
newlib none with all calls to stdio
and printf removed
redlib semihosting with printf
redlib nohosting with printf
redlib nohosting with integer
printf (CR_INTEGER_PRINTF)
redlib nohosting with puts
instead of printf
redlib nohosting without printf
redlib none with all calls to stdio
and printf removed
Code (bytes)
Read Only Data Read-Write Data Zero-Initialized
(bytes)
(bytes)
Data (bytes)
35564
34864
19800
2248
2248
2248
124
68
68
8784
2120
68
2064
2064
4
4
8
8
12880
12824
8111
4
4
4
344
344
344
4004
4
344
3868
2068
4
4
344
8
Rev. 0.1
9
AN720
5.5. Toolchain Optimization Settings
In addition to the library types, each toolchain has multiple optimization settings that can affect the resulting code
size. With the Precision32 toolchain, code optimization can be set by following these steps:
1. Right-click on the project_name in the Project Explorer view.
2. Select Properties.
3. In the C/C++ BuildSettingsTool Settings tabMCU C CompilerOptimization options, select
the desired optimization level.
Figure 5 shows the optimization settings for the Precision32 IDE. Level -O0 has the least optimization, while -O3
has the most optimization. An additional flag (-Os) allows for specific optimization for code size.
More information on the optimization levels can be found on the Code Red website (http://support.code-redtech.com/CodeRedWiki/CompilerOptimization) and the GCC website (http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/
Optimize-Options.html). Declaring a variable as volatile will prevent the compiler from optimizing out the variable.
Figure 5. Setting the Project Optimization in the Precision32 IDE
The Precision32 IDE has two build configurations by default: Debug and Release. These build configurations have
predefined optimization levels (None for Debug, -O2 for Release). To switch between the two configurations:
1. Right-click on the project_name in the Project Explorer view.
2. Select Build ConfigurationsSet Active and select between Debug and Release.
10
Rev. 0.1
AN720
Figure 6. Selecting the Active Build Configuration in the Precision32 IDE
To change the settings of any build configuration:
1. Right-click on the project_name in the Project Explorer view.
2. Select Properties.
3. In the C/C++ BuildSettingsTool Settings tab options, select the build configuration at the top and
the desired build configuration options.
Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package,
Table 6 and Table 7 show the relative Debug build sizes with the different optimization level settings. Table 8 shows
the CoreMark Debug build sizes, and Table 9 lists the CoreMark speed scores for these optimization levels.
Rev. 0.1
11
AN720
Table 6. Precision32 Toolchain Optimization Comparison—sim3u1xx_Blinky Debug
Library
newlib nohosting -O0
newlib nohosting -O1
newlib nohosting -O2
newlib nohosting -O3
newlib nohosting -Os
redlib nohosting -O0
redlib nohosting -O1
redlib nohosting -O2
redlib nohosting -O3
redlib nohosting -Os
Code (bytes)
Read Only
Data (bytes)
34864
34032
33960
33960
33808
13080
12056
12096
12096
11768
Read-Write
Data (bytes)
Zero-Initialized
Data (bytes)
2248
2248
2248
2248
2248
4
4
4
4
4
68
68
68
68
68
344
344
344
344
344
Table 7. Precision32 Toolchain Optimization Comparison—demo_si32UsbAudio Debug
Library
newlib nohosting -O0
newlib nohosting -O1
newlib nohosting -O2
newlib nohosting -O3
newlib nohosting -Os
redlib nohosting -O0
redlib nohosting -O1
redlib nohosting -O2
redlib nohosting -O3
redlib nohosting -Os
Code (bytes)
Read Only
Data (bytes)
108144
84400
83152
85136
76528
76120
52048
50752
52736
44128
Read-Write
Data (bytes)
Zero-Initialized
Data (bytes)
6944
6944
6944
6944
6928
4704
4700
4700
4700
4688
11848
11852
11852
11856
11848
12124
12124
12124
12128
12120
Table 8. Precision32 Toolchain Optimization Comparison—CoreMark Debug Size
Library
newlib semihosting -O0
newlib semihosting -O1
newlib semihosting -O2
newlib semihosting -O3
newlib semihosting -Os
redlib nohosting -O0
redlib nohosting -O1
redlib nohosting -O2
redlib nohosting -O3
redlib nohosting -Os
12
Code (bytes)
Read Only
Data (bytes)
46900
41812
42828
45948
40284
24344
19160
20176
23296
17624
Rev. 0.1
Read-Write
Data (bytes)
Zero-Initialized
Data (bytes)
2352
2256
2256
2256
2256
112
12
12
12
12
2140
2140
2140
2140
2140
2360
2360
2360
2360
2360
AN720
Table 9. Precision32 Toolchain Optimization Comparison—CoreMark Debug Speed
Library
CoreMark Score
newlib semihosting -O0
CoreMark 1.0 : 36.478654 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 79.807436 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 107.984518 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 103.509985 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 87.64509 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 79.998784 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 107.984518 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 103.509985 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 87.64509 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
newlib semihosting -O1
newlib semihosting -O2
newlib semihosting -O3
newlib semihosting -Os
redlib nohosting -O0
redlib nohosting -O1
redlib nohosting -O2
redlib nohosting -O3
redlib nohosting -Os
Rev. 0.1
13
AN720
5.6. Unused Code Removal
Each file in a project becomes an object that is included. In other words, if any functions in a file are used, then the
entire file is included by default. This can become an issue for a project using the si32HAL and only a few functions
from each module.
Removed (unused) functions can be viewed in the map files for the projects.
For Precision32, the -ffunction-sections and -fdata-sections optimization flags place each function and data item
into separate sections in the file before linking them into the project. This means the compiler can optimize out any
unused functions. These flags are present in Example and AppBuilder projects by default and should be configured
on a file-by-file basis. To add or remove these options to a file:
1. Right-click on the file_name in the Project Explorer view.
2. Select Properties.
3. In the C/C++ BuildSettingsTool Settings tabMCU C CompilerMiscellaneous options, add or
remove the -ffunction-sections and -fdata-sections flags after the -fno-builtin flag to the Other flags
text box.
Figure 7. Modifying the Remove Unused Code Compiler Flags in the Precision32 IDE
These flags must be compiled with the --gc-sections linker command, which is enabled by default in the
Precision32 IDE. It is recommended that this linker command always remain enabled. These flags only have a
benefit in some cases, and may cause larger code size and slower execution in some cases.
Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package,
Table 10 and Table 11 show the relative Debug build sizes with different unused code removal settings. For no
unused code removal, the projects were compiled without -ffunction-sections and-fdata-sections and with --gcsections. For the examples with unused code removal, the projects were compiled with -ffunction-sections, fdata-sections, and --gc-sections. Table 12 shows the CoreMark build sizes, and Table 13 shows the CoreMark
scores for the different unused code removal settings.
14
Rev. 0.1
AN720
Table 10. Precision32 Unused Code Removal Comparison—sim3u1xx_Blinky Debug
Library
newlib nohosting with no
unused code removal
newlib nohosting with
unused code removal
redlib nohosting with no
unused code removal
redlib nohosting with unused
code removal
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
35504
2248
68
35112
2248
68
13472
4
344
13080
4
344
Code (bytes)
Read Only Data
(bytes)
Table 11. Precision32 Unused Code Removal Comparison—demo_si32UsbAudio Debug
Library
newlib nohosting with no
unused code removal
newlib nohosting with
unused code removal
redlib nohosting with no
unused code removal
redlib nohosting with unused
code removal
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
122424
7240
12116
108144
6944
11848
90288
5000
12392
76120
4704
12124
Code (bytes)
Read Only Data
(bytes)
Table 12. Precision32 Unused Code Removal Comparison—CoreMark Debug Size
Library
newlib semihosting with no
unused code removal
newlib semihosting with
unused code removal
redlib nohosting with no
unused code removal
redlib nohosting with unused
code removal
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
47188
2368
2140
46900
2352
2140
24656
124
2360
24344
112
2360
Code (bytes)
Read Only Data
(bytes)
Table 13. Precision32 Unused Code Removal Comparison—CoreMark Debug Speed
Library
CoreMark Score
newlib semihosting with no
unused code removal
newlib semihosting with
unused code removal
redlib nohosting with no
unused code removal
redlib nohosting with unused
code removal
CoreMark 1.0 : 37.452232 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 37.875848 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK
Rev. 0.1
15
AN720
5.7. Reset Sequence
The speed of the reset sequence of a device can be an important factor, especially for devices like the SiM3U1xx/
SiM3C1xx that require a reset to exit the lowest power mode.
After the hardware jumps to the reset vector and loads the stack pointer address, the core must initialize the
memory of the device. This involves copying data from flash to RAM and zero-filling any zero-initialized segments.
Then, the reset code typically calls a system initialization function and jumps to main.
This reset sequence may take different times based on the library used with the project. The startup code should
always be compiled with the fastest speed optimization to ensure it takes as little time as possible.
The si32HAL examples have a ~500 ms delay added to a pin reset event to prevent code from switching to a nonexistent clock source and disable the device. This delay can be removed by defining the
si32HalOption_disable_pin_reset_delay symbol in the project.
To define a symbol in the Precision32 IDE:
1. Right-click on the project_name in the Project Explorer view.
2. Select Properties.
3. In the C/C++ BuildSettingsTool Settings tabMCU C CompilerSettings options, add or
remove the symbol to the Defined symbols (-D) area.
Figure 8. Adding a Project Define Symbol in the Precision32 IDE
Table 14 shows the reset time comparison for the toolchain libraries using the fastest speed optimization on the
start up code. This time was measured using the sim3u1xx_Blinky example in Debug mode from the fall of a port
pin at the beginning of the Reset IRQ handler to the fall of a port pin at the beginning of main() on an oscilloscope.
This test requires modification of the si32HAL startup sequence file startup_<device>_p32.c.
16
Rev. 0.1
AN720
Table 14. Precision32 Toolchain Library Usage Comparison—sim3u1xx_Blinky Debug Reset
Sequence
Library
Reset Time (µs)
newlib semihosting with printf()
newlib nohosting with printf()
newlib none with printf() removed
redlib semihosting with printf()
redlib nohosting with printf()
redlib none with printf() removed
242
236
9.4
90
90
9.4
Rev. 0.1
17
AN720
6. ARM/Keil µVision
This section discusses ways to optimize projects using the Keil or ARM toolchain in the µVision IDE. The Keil
µVision tools used for the code size and execution speed testing discussed in this document are version
v4.1.0.894.
6.1. Reading the Map File
The map file is an output of the linker that shows the size of each function and variable and their positions in
memory. This map file is located in the build files for a project. In addition to the functions, the map file includes
information on variables and other symbols, including unused functions that are removed.
Figure 9 shows an excerpt from the sim3u1xx_Blinky map file from the Keil toolchain. The functions are listed with
a base address and size. In this case, the my_rtc_alarm0_handler is 50 bytes located at address 0x0000_03A5.
Figure 9. sim3u1xx_Blinky µVision Map File Example
6.2. Determining a Project’s Code Size
The Keil µVision IDE automatically displays the code size information at the end of a successful build. After
building the si32HAL 1.0.1 sim3u1xx_Blinky example, the IDE outputs:
Program Size: Code=1968 RO-data=296 RW-data=24 ZI-data=1536
".\build\BlinkyApp.axf" - 0 Error(s), 0 Warning(s).
The areas of memory are:
Code:
all program code in decimal
read-only data located in flash in decimal
RW-data: read-write uninitialized data located in RAM in decimal
ZI-data: zero-initialized data located in RAM in decimal
RO-data:
18
Rev. 0.1
AN720
6.3. Toolchain Library Usage
Some toolchains have multiple libraries or settings that can change the size or execution speed of code.
The Keil µVision tools have two options: standard and MicroLIB. To switch between the two:
1. Right-click on the project_name in the Project window and select Options for Target ‘project_name’ or
go to ProjectOptions for Target ‘project_name’.
2. Select the Target tab.
3. Use the Use MicroLIB checkbox to select the library.
Figure 10 shows this dialog in the µVision IDE.
Figure 10. Using the µVision IDE to Select the Project Library
Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package,
Table 15 and Table 16 show the relative Debug build sizes with the different toolchain library options. Table 17
shows the Debug build sizes for CoreMark, and Table 18 shows the relative CoreMark speed scores for each of
these library options.
Table 15. Keil Toolchain Library Usage Comparison—sim3u1xx_Blinky Debug
Library
Code (bytes)
Read Only Data
(bytes)
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
µVision standard
µVision MicroLIB
2296
2068
312
296
24
24
1632
1536
Rev. 0.1
19
AN720
Table 16. Keil Toolchain Library Usage Comparison—demo_si32UsbAudio Debug
Library
Code (bytes)
Read Only Data
(bytes)
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
µVision standard
µVision MicroLIB
51176
47264
4388
3832
5196
5208
18068
17972
Table 17. Keil Toolchain Library Usage Comparison—CoreMark Debug Size
Library
Code (bytes)
Read Only Data
(bytes)
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
µVision standard
µVision MicroLIB
13860
11276
868
636
156
156
3632
3536
Table 18. Keil Toolchain Library Usage Comparison—CoreMark Debug Speed
20
Library
CoreMark Score
µVision standard
µVision MicroLIB
CoreMark 1.0 : 65.602324/ARM4.2 (EDG gcc mode) Iterations=3000/STACK
CoreMark 1.0 : 69.402323/ARM4.2 (EDG gcc mode) Iterations=3000/STACK
Rev. 0.1
AN720
6.4. Function Library Usage
The removal of debugging printf() statements can dramatically reduce the code size of a project. A simple way to
do this is to redefine the printf function at the top of the file containing the printf() calls using the following
statement:
#define printf(args...)
For si32Library examples such as demo_si32UsbAudio, define the statement at the top of myBuildOptions.h to
remove all calls to printf(). Additionally, reduce the footprint by disabling logging in myBuildOptions.h:
#define si32BuildOption_enable_logging 0
This method preserves the printf() statements for later use, if needed. The printf() define can also be
encapsulated with preprocessor #if statements to automatically include this define when building with a Release
configuration.
To verify that all instances of printf() have been removed, search the map file for the project for the printf library. In
the sim3u1xx_Blinky example, this means adding the statement to both the main.c and gCpu.c files.
Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package,
Table 19 and Table 20 show the relative build sizes with the different printf() settings. This section does not include
the CoreMark tests since printf is not part of the CoreMark benchmark.
Table 19. Keil printf() Comparison—sim3u1xx_Blinky Debug
Library
Code (bytes)
µVision MicroLIB with printf
µVision MicroLIB without printf
2068
1392
Read Only Data Read-Write Data Zero-Initialized
(bytes)
(bytes)
Data (bytes)
296
296
24
12
1536
1536
Table 20. Keil printf() Comparison—demo_si32UsbAudio Debug
Library
Code (bytes)
µVision MicroLIB with printf
µVision MicroLIB without printf
47264
39760
Read Only Data Read-Write Data Zero-Initialized
(bytes)
(bytes)
Data (bytes)
3832
4312
Rev. 0.1
5208
5196
17972
17972
21
AN720
6.5. Toolchain Optimization Settings
In addition to the library types, each toolchain has multiple optimization settings that can affect the resulting code
size. In Keil µVision, the optimization settings are set using the following steps:
1. Right-click on the project_name in the Project window and select Options for Target ‘project_name’ or
go to ProjectOptions for Target ‘project_name’.
2. Select the C/C++ tab.
3. Use the Optimization drop-down menu to set the project optimization setting.
Figure 11 shows the optimization settings in the IDE.
The available options are:
Level
0: minimum optimization
Level 1: restricted optimization, removing inline functions and unused static functions
Level 2: high optimization
Level 3: maximum optimization with aims to produce faster code or smaller code size than Level 2,
depending on the options used
In addition to the levels, µVision also has an Optimize for Time selection available below the Optimization dropdown menu. Declaring a variable as volatile will prevent the compiler from optimizing out the variable.
More information on these optimization levels can be found on the Keil website (http://www.keil.com/support/man/
docs/uv4/uv4_dg_adscc.htm).
Figure 11. Setting the Project Optimization in the µVision IDE
Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package,
Table 21 and Table 22 show the relative Debug build sizes with the different optimization level settings. Table 23
shows the CoreMark Debug build sizes, and Table 24 lists the CoreMark speed scores for these optimization
levels.
22
Rev. 0.1
AN720
Table 21. Keil Toolchain Optimization Comparison—sim3u1xx_Blinky Debug
Library
Code (bytes)
Read Only
Data (bytes)
Read-Write
Data (bytes)
Zero-Initialized
Data (bytes)
µVision MicroLIB -O0
µVision MicroLIB -O0
(with Optimize for Time)
µVision MicroLIB -O1
µVision MicroLIB -O1
(with Optimize for Time)
µVision MicroLIB -O2
µVision MicroLIB -O2
(with Optimize for Time)
µVision MicroLIB -O3
µVision MicroLIB -O3
(with Optimize for Time)
2068
2068
296
296
24
24
1536
1536
1704
1648
296
296
20
20
1536
1536
1616
1600
296
296
20
20
1536
1536
1604
1596
296
296
20
20
1536
1536
Table 22. Keil Toolchain Optimization Comparison—demo_si32UsbAudio Debug
Library
Code (bytes)
Read Only
Data (bytes)
Read-Write
Data (bytes)
Zero-Initialized
Data (bytes)
µVision MicroLIB -O0
µVision MicroLIB -O0
(with Optimize for Time)
µVision MicroLIB -O1
µVision MicroLIB -O1
(with Optimize for Time)
µVision MicroLIB -O2
µVision MicroLIB -O2
(with Optimize for Time)
µVision MicroLIB -O3
µVision MicroLIB -O3
(with Optimize for Time)
47264
47264
3832
3832
5208
5208
17972
17972
38816
39924
3832
3832
5132
5132
17952
17952
36540
39840
3832
3832
5132
5132
17952
17952
36468
41532
3832
3832
5132
5132
17952
17952
Rev. 0.1
23
AN720
Table 23. Keil Toolchain Optimization Comparison—CoreMark Debug Size
Library
Code (bytes)
Read Only
Data (bytes)
Read-Write
Data (bytes)
Zero-Initialized
Data (bytes)
µVision MicroLIB -O0
µVision MicroLIB -O0
(with Optimize for Time)
µVision MicroLIB -O1
µVision MicroLIB -O1
(with Optimize for Time)
µVision MicroLIB -O2
µVision MicroLIB -O2
(with Optimize for Time)
µVision MicroLIB -O3
µVision MicroLIB -O3
(with Optimize for Time)
11276
11276
636
636
156
156
3536
3536
9788
10136
616
616
140
140
3536
3536
9640
10684
616
616
140
140
3536
3536
9680
11500
616
616
140
140
3536
3536
Table 24. Keil Toolchain Optimization Comparison—CoreMark Debug Speed
Library
CoreMark Score
µVision MicroLIB -O0
µVision MicroLIB -O0
(with Optimize for Time)
µVision MicroLIB -O1
µVision MicroLIB -O1
(with Optimize for Time)
µVision MicroLIB -O2
µVision MicroLIB -O2
(with Optimize for Time)
µVision MicroLIB -O3
µVision MicroLIB -O3
(with Optimize for Time)
CoreMark 1.0 : 69.402323 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
CoreMark 1.0 : 69.402323 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
24
CoreMark 1.0 : 75.279256 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
CoreMark 1.0 : 75.206352 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
CoreMark 1.0 : 74.247855 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
CoreMark 1.0 : 87.277701 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
CoreMark 1.0 : 79.520321 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
CoreMark 1.0 : 102.697150 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
Rev. 0.1
AN720
6.6. Unused Code Removal
Each file in a project becomes an object that is included. In other words, if any functions in a file are used, then the
entire file is included by default. This can become an issue for a project using the si32HAL and only a few functions
from each module.
Removed (unused) functions can be viewed in the map files for the projects.
The unused code removal feature is not automatically enabled in the Keil µVision IDE. To enable this feature:
1. Right-click on the project_name in the Project window and select Options for Target ‘project_name’ or
go to ProjectOptions for Target ‘project_name’.
2. Select the C/C++ tab.
3. Use the One ELF Section per Function checkbox to enable or disable unused code removal.
Figure 12. Setting the Remove Unused Code Option in the µVision IDE
Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package,
Table 25 and Table 26 show the relative Debug build sizes with different unused code removal settings. Table 27
shows the CoreMark build sizes, and Table 28 shows the CoreMark scores for the different unused code removal
settings.
Rev. 0.1
25
AN720
Table 25. Keil Unused Code Removal Comparison—sim3u1xx_Blinky Debug
Library
Code (bytes)
Read Only Data
(bytes)
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
µVision MicroLIB with no
unused code removal
µVision MicroLIB with
unused code removal
1392
296
12
1536
1184
296
12
1536
Table 26. Keil Unused Code Removal Comparison—demo_si32UsbAudio Debug
Library
Code (bytes)
Read Only Data
(bytes)
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
µVision MicroLIB with no
unused code removal
µVision MicroLIB with
unused code removal
47264
3832
5208
17972
43464
3772
5060
17780
Table 27. Keil Unused Code Removal Comparison—CoreMark Debug Size
Library
Code (bytes)
Read Only Data
(bytes)
Read-Write Data
(bytes)
Zero-Initialized
Data (bytes)
µVision MicroLIB with no
unused code removal
µVision MicroLIB with
unused code removal
11276
636
156
3536
11012
636
156
3536
Table 28. Keil Unused Code Removal Comparison—CoreMark Debug Speed
26
Library
CoreMark Score
µVision MicroLIB with no
unused code removal
µVision MicroLIB with
unused code removal
CoreMark 1.0 : 69.402324 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
CoreMark 1.0 : 67.374626 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK
Rev. 0.1
AN720
6.7. Reset Sequence
The speed of the reset sequence of a device can be an important factor, especially for devices like the SiM3U1xx/
SiM3C1xx that require a reset to exit the lowest power mode.
After the hardware jumps to the reset vector and loads the stack pointer address, the core must initialize the
memory of the device. This involves copying data from flash to RAM and zero-filling any zero-initialized segments.
Then, the reset code typically calls a system initialization function and jumps to main.
This reset sequence may take different times based on the library used with the project. The startup code should
always be compiled with the fastest speed optimization to ensure it takes as little time as possible.
The si32HAL examples have a ~500 ms delay added to a pin reset event to prevent code from switching to a nonexistent clock source and disable the device. This delay can be removed by defining the
si32HalOption_disable_pin_reset_delay symbol in the project.
To define a symbol in Keil µVision:
1. Right-click on the project_name in the Project window and select Options for Target ‘project_name’ or
go to ProjectOptions for Target ‘project_name’.
2. Select the C/C++ tab.
3. Use the Define text box to add or remove project symbols.
Figure 13. Adding a Project Define Symbol in the µVision IDE
Table 29 shows the reset time comparison for the toolchain libraries using the fastest speed optimization on the
start up code. This time was measured using the sim3u1xx_Blinky example in Debug mode from the rise of
RESETb to the fall of a port pin at the beginning of main() on an oscilloscope.
Table 29. Keil Toolchain Library Usage Comparison—sim3u1xx_Blinky Debug Reset Sequence
Library
Reset Time (µs)
µVision standard
µVision MicroLIB
52
48
Rev. 0.1
27
AN720
CONTACT INFORMATION
Silicon Laboratories Inc.
400 West Cesar Chavez
Austin, TX 78701
Tel: 1+(512) 416-8500
Fax: 1+(512) 416-9669
Toll Free: 1+(877) 444-3032
Please visit the Silicon Labs Technical Support web page:
https://www.silabs.com/support/pages/contacttechnicalsupport.aspx
and register to submit a technical support request.
Patent Notice
Silicon Labs invests in research and development to help our customers differentiate in the market with innovative low-power, small size, analogintensive mixed-signal solutions. Silicon Labs' extensive patent portfolio is a testament to our unique approach and world-class engineering team.
The information in this document is believed to be accurate in all respects at the time of publication but is subject to change without notice.
Silicon Laboratories assumes no responsibility for errors and omissions, and disclaims responsibility for any consequences resulting from
the use of information included herein. Additionally, Silicon Laboratories assumes no responsibility for the functioning of undescribed features
or parameters. Silicon Laboratories reserves the right to make changes without further notice. Silicon Laboratories makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose, nor does Silicon Laboratories assume any liability
arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. Silicon Laboratories products are not designed, intended, or authorized for use in applications intended to
support or sustain life, or for any other application in which the failure of the Silicon Laboratories product could create a situation where personal injury or death may occur. Should Buyer purchase or use Silicon Laboratories products for any such unintended or unauthorized application, Buyer shall indemnify and hold Silicon Laboratories harmless against all claims and damages.
Silicon Laboratories and Silicon Labs are trademarks of Silicon Laboratories Inc.
Other products or brandnames mentioned herein are trademarks or registered trademarks of their respective holders.
28
Rev. 0.1