AN720 P RECISION 32™ O P T I MI Z A T I O N C ONSIDERATIONS FOR C ODE S I Z E AND S PEED 1. Introduction The code size and execution speed of a 32-bit MCU project can vary greatly depending on the way the code is written, the toolchain libraries used, and the compiler and linker options. This document addresses how to determine what portions of code are taking extra space or time and ways to optimize for space or speed for different tool chains, including GCC redlib and newlib (Precision32 IDE) and Keil. 2. Key Points The key topics of this document are: How to determine what portions of the project are taking the most space Ways to benchmark code execution speed Common strategies to reduce code size or improve execution speed Code startup time and ways to reduce it 3. Using CoreMark™ as a Speed Benchmark CoreMark is a standard code base that can be ported to various processors to provide a speed benchmark. The CoreMark software provides a score that rates how fast the core and code is, providing a relative comparison between various toolchain options and settings. The CoreMark software package cannot be modified except for device-specific information in the portme files. For modes that do not support printf (nohosting libraries), the results were calculated using the value of the variable in code. See the CoreMark website for more information on the test and score reporting requirements (www.coremark.org). 4. Non-Toolchain Considerations The coding style and technique can have a great effect on the overall size of the project. 4.1. Coding Techniques There are many ways coding technique can affect code size, including library calls, inline code or data, or code optimizations made for global variables or pointers. For more information on writing C code for ARM architectures, see the following resources: EETimes - Energy efficient C code for ARM devices by Chris Shore: http://www.eetimes.com/design/ embedded/4210470/Efficient-C-Code-for-ARM-Devices Compiler Coding Practices - ARM: http://infocenter.arm.com/help/index.jsp?topic=/ com.arm.doc.dui0472c/CJAFJCFG.html These guidelines will largely apply regardless of the compiler used for the project. 4.2. Number of Function Parameters Functions with either Keil or GCC can have as many parameters as desired. In general, the first four parameters are passed to the function efficiently using registers. Any additional parameters beyond four must be moved on or off the stack, which results in extra code size for each additional parameter and extra time to execute those instructions. If possible, keeping functions to no more than four parameters can help reduce code size and execution time. Rev. 0.1 9/12 Copyright © 2012 by Silicon Laboratories AN720 AN720 4.3. Alignment In most cases, Cortex-M3 linkers place code in memory efficiently. In some projects, however, the alignment of functions and code can be carefully managed manually to reduce code size or change code execution speed. For example, if two functions in the same file call each other, but one ends up in flash and one ends up in RAM, the compiler may need to place extra code to perform a long jump and take longer to execute that jump. If needed, functions and variables can be explicitly located using scatterfiles and linker flags. More information on linker scripts and scatterfiles can be found on the Code Red (http://support.code-red-tech.com/CodeRedWiki/ OwnLinkScripts) and ARM websites (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.kui0101a/ armlink_babddhbf.htm). 4.4. RAM Size The RAM size of a project can be just as important as the code size. In particular, the default configurations for SiM3xxxx projects place the stack at the top of memory growing down and the heap at the end of program data growing up. If too much of the RAM is used by program data, then the stack and heap may collide, leading to difficult debugging issues in run-time. Projects should always leave enough RAM space to accommodate the function-calling depth of the code. 4.5. SiM3xxxx Core and Flash Access Speed At the maximum device AHB speed, an SiM3xxxx device reading flash every pipeline cycle may violate the maximum flash access speed. To compensate for this, the FLASHCTRL module has controls to reduce the flash access speed (SPMD and RDSEN). Depending on the code density and make-up (i.e., 16-bit or 32-bit instructions), this may lead to stalls in the core before the next instructions can be fetched from flash. Executing at high speeds with strings of 16-bit instructions may yield the fastest core operation. 4.6. SiM3xxxx Core and the Direct Memory Access (DMA) Module On SiM3xxxx devices, the core and the DMA can access multiple AHB slaves at the same time without any performance degradation. If the core and DMA access the same AHB slave at the same time (i.e., RAM), then the AHB has priority-based arbitration in the following precedence: 1. Core data fetch 2. DMA 3. Core instruction fetch If multiple DMA channels are active at the same time and accessing the same memory areas as the core, this could lead to a reduction in core execution speed. 2 Rev. 0.1 AN720 5. Precision32 IDE (redlib and newlib) This section discusses ways to optimize projects using the Precision32 IDE and both redlib and newlib libraries. The Precision32 GCC tools used for the code size and execution speed testing discussed in this document are ARM/embedded-4_6-branch revision 182083 (http://gcc.gnu.org/svn/gcc/branches/ARM/embedded-4_6-branch/) with newlib v1.19 and Redlib v2 (Precision32 IDE v4.2.1 [Build 73]). 5.1. Reading the Map File The first step in the code size optimization process is to analyze the project map file and determine what portions of code take the most space. The map file is an output of the linker that shows the size of each function and variable and their positions in memory. This map file is located in the build files for a project. In addition to the functions, the map file includes information on variables and other symbols, including unused functions that are removed. For a Precision32 IDE Debug build, the map file is located in the project’s Debug directory. Figure 1 shows an excerpt of the sim3u1xx_Blinky redlib Debug example map file. For each function in the project, the map file lists the starting address and the length. For example, the my_rtc_alarm0_handler function starts at address 0x0000_04D4 and occupies 0x70 bytes of memory. Figure 1. sim3u1xx_Blinky Precision32 Debug Map File Example 5.2. Determining a Project’s Code Size Each project’s library and function usage is different. Analyzing the project’s makeup can help determine the most effective way to reduce code space. All Precision32 SDK projects automatically output the code and RAM size after a build. To modify this output in the Precision32 IDE: 1. Right-click on the project_name in the Project Explorer view. 2. Select Properties. 3. In the C/C++ BuildSettingsBuild Steps tab, remove or add the following in the Post-build stepsCommand box: arm-none-eabi-size "${BuildArtifactFileName}" After building the si32HAL 1.0.1 sim3u1xx_Blinky example, the IDE outputs: text data bss dec hex 13312 4 344 13660 355c Rev. 0.1 3 AN720 The areas of memory are: text: code and read-only memory in decimal read-write data in decimal bss: zero-initialized data in decimal dec: total of text, data, and bss in decimal hex: total of text, data, and bss in hex More information about the size tool can be found on the Code Red website (http://support.code-red-tech.com/ CodeRedWiki/FlashRamSize). data: Figure 2. Automatically Reporting Project Size on Project Build in Precision32 4 Rev. 0.1 AN720 5.3. Toolchain Library Usage Some toolchains have multiple libraries or settings that can change the size or execution speed of code. The Precision32 tools have six options: newlib (standard GCC) with no standard I/O newlib (standard GCC) with nohosting standard I/O newlib (standard GCC) with semihosting standard I/O redlib (GCC) with no standard I/O redlib (GCC) with nohosting standard I/O redlib (GCC) with semihosting standard I/O The semihosting libraries have additional hooks to enable a project to send debugging information to an IDE running on a PC. The nohosting libraries have this additional capability removed. The none versions of the toolchains have no standard I/O capability (i.e., no printf()). For some example projects (like si32HAL 1.0.1 sim3u1xx_Blinky), the compile-time library can be modified by opening the myLinkerOptions_p32.ld file in the project directory and changing the uncommented line. Figure 3. Using the myLinkerOptions_p32.ld File to Select the Project Library The four lines in the file correspond to a library: GROUP(libgcc.a libc.a libm.a libcr_newlib_nohost.a) (line 4): newlib nohosting libc.a libm.a libcr_newlib_semihost.a) (line 5): newlib semihosting GROUP(libcr_semihost.a libcr_c.a libcr_eabihelpers.a) (line 6): redlib semihosting GROUP(libcr_nohost.a libcr_c.a libcr_eabihelpers.a) (line 7): redlib nohosting The none libraries do not have corresponding entries in this file. Add these lines to add support for none: GROUP(libgca.a GROUP(libgcc.a libc.a libm.a): newlib none GROUP(libcr_c.a libcr_eabihelpers.a): redlib none After setting the myLinkerOptions_P32.ld file to the correct setting, set the IDE to the same library using these steps: 1. Left-click on the project_name in the Project Explorer view. 2. Select Properties. 3. Click on C/C++ BuildSettingsTool Settings tabMCU LinkerTarget and select the desired library from the Use C library drop-down menu. Figure 4 shows this dialog in the Precision32 IDE. 4. Clean and Build the project. AppBuilder projects do not have a myLinkerOptions_P32.ld file and can use the Quickstart view setting only. Rev. 0.1 5 AN720 Figure 4. Using the Precision32 IDE to Select the Project Library Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package, Table 1 and Table 2 show the relative Debug build sizes with the different toolchain library options. Table 3 shows the Debug build sizes for CoreMark, and Table 4 shows the relative CoreMark speed scores for each of these library options. For the newlib and redlib none libraries, see “5.4. Function Library Usage”. Table 1. Precision32 Toolchain Library Usage Comparison—sim3u1xx_Blinky Debug Library newlib semihosting newlib nohosting newlib none redlib semihosting redlib nohosting redlib none 6 Code (bytes) Read Only Data (bytes) 35564 34864 13080 13136 Read-Write Data (bytes) 2248 2248 N/A (requires printf() removal) 4 4 N/A (requires printf() removal) Rev. 0.1 Zero-Initialized Data (bytes) 124 68 344 344 AN720 Table 2. Precision32 Toolchain Library Usage Comparison—demo_si32UsbAudio Debug Library newlib semihosting newlib nohosting newlib none redlib semihosting redlib nohosting redlib none Code (bytes) Read Only Data (bytes) 108844 108144 76176 76120 Read-Write Data (bytes) 6944 6944 N/A (requires printf() removal) 4704 4704 N/A (requires printf() removal) Zero-Initialized Data (bytes) 11904 11848 12124 12124 Table 3. Precision32 Toolchain Library Usage Comparison—CoreMark Debug Size Library newlib semihosting newlib nohosting newlib none redlib semihosting redlib nohosting redlib none Code (bytes) Read Only Data (bytes) 46900 46208 24400 24344 Read-Write Data (bytes) 2352 2352 N/A (requires printf() removal) 112 112 N/A (requires printf() removal) Zero-Initialized Data (bytes) 2140 2084 2360 2360 Table 4. Precision32 Toolchain Library Usage Comparison—CoreMark Debug Speed Library CoreMark Score newlib semihosting CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK N/A (requires printf() removal) CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK N/A (requires printf() removal) newlib nohosting newlib none redlib semihosting redlib nohosting redlib none Rev. 0.1 7 AN720 5.4. Function Library Usage Function libraries such as floating point math and printf() can significantly increase the size of a project. If a project is constrained by size, a careful analysis of the usage of these large libraries may be required. For example, floating point can often be approximated well by fixed point math, eliminating the need for the floating point libraries. The printf() library is often needed by projects for debugging or release code. If printf() is used for debugging purposes, using a defined symbol in the project to remove printf() when compiling a release build can dramatically reduce the size of a project. To define a symbol to differentiate between a Debug project and a Release project, see “ Contact Information”. The code can then use #ifdef...#endif preprocessor statements to remove debugging code or printf() calls. The removal of debugging printf() statements can dramatically reduce the code size of a project. A simple way to do this is to redefine the printf function at the top of the file containing the printf() calls using the following statement: #define printf(args...) For si32Library examples such as demo_si32UsbAudio, define the statement at the top of myBuildOptions.h to remove all calls to printf() with higher optimization settings. Additionally, reduce the code size footprint by disabling logging in myBuildOptions.h: #define si32BuildOption_enable_logging 0 This method preserves the printf() statements for later use, if needed. The printf() define can also be encapsulated with preprocessor #if statements to automatically include this define when building with a Release configuration. When removing printf() for use with newlib none or redlib none, all references to printf() and stdio.h must be commented out of the project. The none libraries cannot be used with si32Library projects. To verify that all instances of printf() have been removed, search the map file for the project for the printf library. In the sim3u1xx_Blinky example, this means adding the statement to both the main.c and gCpu.c files. Instead of using standard printf(), which can have a high library cost, use integer-only print functions like iprintf() for newlib projects. For redlib projects in the Precision32 IDE, create a define CR_INTEGER_PRINTF in the project properties to force an integer-only version of printf(). For instances of printf() with a fixed-string, using puts() can dramatically reduce code size. More information about redlib and printf() can be found on the Code Red website: http://support.code-redtech.com/CodeRedWiki/UsingPrintf. If a project does not use any standard I/O functions, use the redlib or newlib none toolchain option to reduce code size as discussed in “6.3. Toolchain Library Usage”. Using the sim3u1xx_Blinky default example in the si32HAL 1.0.1 software package, Table 5 shows the relative build sizes with the different printf() settings. The demo_si32UsbAudio comparison is not included since printf() removal requires higher optimization settings or code modifications. This section also does not include the CoreMark tests since printf is not part of the CoreMark benchmark. 8 Rev. 0.1 AN720 Table 5. Precision32 printf() Comparison—sim3u1xx_Blinky Debug Library newlib semihosting with printf newlib nohosting with printf newlib nohosting with integer printf (iprintf) newlib nohosting with puts instead of printf newlib nohosting without printf newlib none with all calls to stdio and printf removed redlib semihosting with printf redlib nohosting with printf redlib nohosting with integer printf (CR_INTEGER_PRINTF) redlib nohosting with puts instead of printf redlib nohosting without printf redlib none with all calls to stdio and printf removed Code (bytes) Read Only Data Read-Write Data Zero-Initialized (bytes) (bytes) Data (bytes) 35564 34864 19800 2248 2248 2248 124 68 68 8784 2120 68 2064 2064 4 4 8 8 12880 12824 8111 4 4 4 344 344 344 4004 4 344 3868 2068 4 4 344 8 Rev. 0.1 9 AN720 5.5. Toolchain Optimization Settings In addition to the library types, each toolchain has multiple optimization settings that can affect the resulting code size. With the Precision32 toolchain, code optimization can be set by following these steps: 1. Right-click on the project_name in the Project Explorer view. 2. Select Properties. 3. In the C/C++ BuildSettingsTool Settings tabMCU C CompilerOptimization options, select the desired optimization level. Figure 5 shows the optimization settings for the Precision32 IDE. Level -O0 has the least optimization, while -O3 has the most optimization. An additional flag (-Os) allows for specific optimization for code size. More information on the optimization levels can be found on the Code Red website (http://support.code-redtech.com/CodeRedWiki/CompilerOptimization) and the GCC website (http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/ Optimize-Options.html). Declaring a variable as volatile will prevent the compiler from optimizing out the variable. Figure 5. Setting the Project Optimization in the Precision32 IDE The Precision32 IDE has two build configurations by default: Debug and Release. These build configurations have predefined optimization levels (None for Debug, -O2 for Release). To switch between the two configurations: 1. Right-click on the project_name in the Project Explorer view. 2. Select Build ConfigurationsSet Active and select between Debug and Release. 10 Rev. 0.1 AN720 Figure 6. Selecting the Active Build Configuration in the Precision32 IDE To change the settings of any build configuration: 1. Right-click on the project_name in the Project Explorer view. 2. Select Properties. 3. In the C/C++ BuildSettingsTool Settings tab options, select the build configuration at the top and the desired build configuration options. Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package, Table 6 and Table 7 show the relative Debug build sizes with the different optimization level settings. Table 8 shows the CoreMark Debug build sizes, and Table 9 lists the CoreMark speed scores for these optimization levels. Rev. 0.1 11 AN720 Table 6. Precision32 Toolchain Optimization Comparison—sim3u1xx_Blinky Debug Library newlib nohosting -O0 newlib nohosting -O1 newlib nohosting -O2 newlib nohosting -O3 newlib nohosting -Os redlib nohosting -O0 redlib nohosting -O1 redlib nohosting -O2 redlib nohosting -O3 redlib nohosting -Os Code (bytes) Read Only Data (bytes) 34864 34032 33960 33960 33808 13080 12056 12096 12096 11768 Read-Write Data (bytes) Zero-Initialized Data (bytes) 2248 2248 2248 2248 2248 4 4 4 4 4 68 68 68 68 68 344 344 344 344 344 Table 7. Precision32 Toolchain Optimization Comparison—demo_si32UsbAudio Debug Library newlib nohosting -O0 newlib nohosting -O1 newlib nohosting -O2 newlib nohosting -O3 newlib nohosting -Os redlib nohosting -O0 redlib nohosting -O1 redlib nohosting -O2 redlib nohosting -O3 redlib nohosting -Os Code (bytes) Read Only Data (bytes) 108144 84400 83152 85136 76528 76120 52048 50752 52736 44128 Read-Write Data (bytes) Zero-Initialized Data (bytes) 6944 6944 6944 6944 6928 4704 4700 4700 4700 4688 11848 11852 11852 11856 11848 12124 12124 12124 12128 12120 Table 8. Precision32 Toolchain Optimization Comparison—CoreMark Debug Size Library newlib semihosting -O0 newlib semihosting -O1 newlib semihosting -O2 newlib semihosting -O3 newlib semihosting -Os redlib nohosting -O0 redlib nohosting -O1 redlib nohosting -O2 redlib nohosting -O3 redlib nohosting -Os 12 Code (bytes) Read Only Data (bytes) 46900 41812 42828 45948 40284 24344 19160 20176 23296 17624 Rev. 0.1 Read-Write Data (bytes) Zero-Initialized Data (bytes) 2352 2256 2256 2256 2256 112 12 12 12 12 2140 2140 2140 2140 2140 2360 2360 2360 2360 2360 AN720 Table 9. Precision32 Toolchain Optimization Comparison—CoreMark Debug Speed Library CoreMark Score newlib semihosting -O0 CoreMark 1.0 : 36.478654 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 79.807436 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 107.984518 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 103.509985 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 87.64509 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 79.998784 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 107.984518 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 103.509985 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 87.64509 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK newlib semihosting -O1 newlib semihosting -O2 newlib semihosting -O3 newlib semihosting -Os redlib nohosting -O0 redlib nohosting -O1 redlib nohosting -O2 redlib nohosting -O3 redlib nohosting -Os Rev. 0.1 13 AN720 5.6. Unused Code Removal Each file in a project becomes an object that is included. In other words, if any functions in a file are used, then the entire file is included by default. This can become an issue for a project using the si32HAL and only a few functions from each module. Removed (unused) functions can be viewed in the map files for the projects. For Precision32, the -ffunction-sections and -fdata-sections optimization flags place each function and data item into separate sections in the file before linking them into the project. This means the compiler can optimize out any unused functions. These flags are present in Example and AppBuilder projects by default and should be configured on a file-by-file basis. To add or remove these options to a file: 1. Right-click on the file_name in the Project Explorer view. 2. Select Properties. 3. In the C/C++ BuildSettingsTool Settings tabMCU C CompilerMiscellaneous options, add or remove the -ffunction-sections and -fdata-sections flags after the -fno-builtin flag to the Other flags text box. Figure 7. Modifying the Remove Unused Code Compiler Flags in the Precision32 IDE These flags must be compiled with the --gc-sections linker command, which is enabled by default in the Precision32 IDE. It is recommended that this linker command always remain enabled. These flags only have a benefit in some cases, and may cause larger code size and slower execution in some cases. Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package, Table 10 and Table 11 show the relative Debug build sizes with different unused code removal settings. For no unused code removal, the projects were compiled without -ffunction-sections and-fdata-sections and with --gcsections. For the examples with unused code removal, the projects were compiled with -ffunction-sections, fdata-sections, and --gc-sections. Table 12 shows the CoreMark build sizes, and Table 13 shows the CoreMark scores for the different unused code removal settings. 14 Rev. 0.1 AN720 Table 10. Precision32 Unused Code Removal Comparison—sim3u1xx_Blinky Debug Library newlib nohosting with no unused code removal newlib nohosting with unused code removal redlib nohosting with no unused code removal redlib nohosting with unused code removal Read-Write Data (bytes) Zero-Initialized Data (bytes) 35504 2248 68 35112 2248 68 13472 4 344 13080 4 344 Code (bytes) Read Only Data (bytes) Table 11. Precision32 Unused Code Removal Comparison—demo_si32UsbAudio Debug Library newlib nohosting with no unused code removal newlib nohosting with unused code removal redlib nohosting with no unused code removal redlib nohosting with unused code removal Read-Write Data (bytes) Zero-Initialized Data (bytes) 122424 7240 12116 108144 6944 11848 90288 5000 12392 76120 4704 12124 Code (bytes) Read Only Data (bytes) Table 12. Precision32 Unused Code Removal Comparison—CoreMark Debug Size Library newlib semihosting with no unused code removal newlib semihosting with unused code removal redlib nohosting with no unused code removal redlib nohosting with unused code removal Read-Write Data (bytes) Zero-Initialized Data (bytes) 47188 2368 2140 46900 2352 2140 24656 124 2360 24344 112 2360 Code (bytes) Read Only Data (bytes) Table 13. Precision32 Unused Code Removal Comparison—CoreMark Debug Speed Library CoreMark Score newlib semihosting with no unused code removal newlib semihosting with unused code removal redlib nohosting with no unused code removal redlib nohosting with unused code removal CoreMark 1.0 : 37.452232 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 37.875848 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK CoreMark 1.0 : 37.571643 / GCC4.6.2 20110921 (release) [ARM/embedded-4_6branch revision 182083] Iterations=3000 / STACK Rev. 0.1 15 AN720 5.7. Reset Sequence The speed of the reset sequence of a device can be an important factor, especially for devices like the SiM3U1xx/ SiM3C1xx that require a reset to exit the lowest power mode. After the hardware jumps to the reset vector and loads the stack pointer address, the core must initialize the memory of the device. This involves copying data from flash to RAM and zero-filling any zero-initialized segments. Then, the reset code typically calls a system initialization function and jumps to main. This reset sequence may take different times based on the library used with the project. The startup code should always be compiled with the fastest speed optimization to ensure it takes as little time as possible. The si32HAL examples have a ~500 ms delay added to a pin reset event to prevent code from switching to a nonexistent clock source and disable the device. This delay can be removed by defining the si32HalOption_disable_pin_reset_delay symbol in the project. To define a symbol in the Precision32 IDE: 1. Right-click on the project_name in the Project Explorer view. 2. Select Properties. 3. In the C/C++ BuildSettingsTool Settings tabMCU C CompilerSettings options, add or remove the symbol to the Defined symbols (-D) area. Figure 8. Adding a Project Define Symbol in the Precision32 IDE Table 14 shows the reset time comparison for the toolchain libraries using the fastest speed optimization on the start up code. This time was measured using the sim3u1xx_Blinky example in Debug mode from the fall of a port pin at the beginning of the Reset IRQ handler to the fall of a port pin at the beginning of main() on an oscilloscope. This test requires modification of the si32HAL startup sequence file startup_<device>_p32.c. 16 Rev. 0.1 AN720 Table 14. Precision32 Toolchain Library Usage Comparison—sim3u1xx_Blinky Debug Reset Sequence Library Reset Time (µs) newlib semihosting with printf() newlib nohosting with printf() newlib none with printf() removed redlib semihosting with printf() redlib nohosting with printf() redlib none with printf() removed 242 236 9.4 90 90 9.4 Rev. 0.1 17 AN720 6. ARM/Keil µVision This section discusses ways to optimize projects using the Keil or ARM toolchain in the µVision IDE. The Keil µVision tools used for the code size and execution speed testing discussed in this document are version v4.1.0.894. 6.1. Reading the Map File The map file is an output of the linker that shows the size of each function and variable and their positions in memory. This map file is located in the build files for a project. In addition to the functions, the map file includes information on variables and other symbols, including unused functions that are removed. Figure 9 shows an excerpt from the sim3u1xx_Blinky map file from the Keil toolchain. The functions are listed with a base address and size. In this case, the my_rtc_alarm0_handler is 50 bytes located at address 0x0000_03A5. Figure 9. sim3u1xx_Blinky µVision Map File Example 6.2. Determining a Project’s Code Size The Keil µVision IDE automatically displays the code size information at the end of a successful build. After building the si32HAL 1.0.1 sim3u1xx_Blinky example, the IDE outputs: Program Size: Code=1968 RO-data=296 RW-data=24 ZI-data=1536 ".\build\BlinkyApp.axf" - 0 Error(s), 0 Warning(s). The areas of memory are: Code: all program code in decimal read-only data located in flash in decimal RW-data: read-write uninitialized data located in RAM in decimal ZI-data: zero-initialized data located in RAM in decimal RO-data: 18 Rev. 0.1 AN720 6.3. Toolchain Library Usage Some toolchains have multiple libraries or settings that can change the size or execution speed of code. The Keil µVision tools have two options: standard and MicroLIB. To switch between the two: 1. Right-click on the project_name in the Project window and select Options for Target ‘project_name’ or go to ProjectOptions for Target ‘project_name’. 2. Select the Target tab. 3. Use the Use MicroLIB checkbox to select the library. Figure 10 shows this dialog in the µVision IDE. Figure 10. Using the µVision IDE to Select the Project Library Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package, Table 15 and Table 16 show the relative Debug build sizes with the different toolchain library options. Table 17 shows the Debug build sizes for CoreMark, and Table 18 shows the relative CoreMark speed scores for each of these library options. Table 15. Keil Toolchain Library Usage Comparison—sim3u1xx_Blinky Debug Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision standard µVision MicroLIB 2296 2068 312 296 24 24 1632 1536 Rev. 0.1 19 AN720 Table 16. Keil Toolchain Library Usage Comparison—demo_si32UsbAudio Debug Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision standard µVision MicroLIB 51176 47264 4388 3832 5196 5208 18068 17972 Table 17. Keil Toolchain Library Usage Comparison—CoreMark Debug Size Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision standard µVision MicroLIB 13860 11276 868 636 156 156 3632 3536 Table 18. Keil Toolchain Library Usage Comparison—CoreMark Debug Speed 20 Library CoreMark Score µVision standard µVision MicroLIB CoreMark 1.0 : 65.602324/ARM4.2 (EDG gcc mode) Iterations=3000/STACK CoreMark 1.0 : 69.402323/ARM4.2 (EDG gcc mode) Iterations=3000/STACK Rev. 0.1 AN720 6.4. Function Library Usage The removal of debugging printf() statements can dramatically reduce the code size of a project. A simple way to do this is to redefine the printf function at the top of the file containing the printf() calls using the following statement: #define printf(args...) For si32Library examples such as demo_si32UsbAudio, define the statement at the top of myBuildOptions.h to remove all calls to printf(). Additionally, reduce the footprint by disabling logging in myBuildOptions.h: #define si32BuildOption_enable_logging 0 This method preserves the printf() statements for later use, if needed. The printf() define can also be encapsulated with preprocessor #if statements to automatically include this define when building with a Release configuration. To verify that all instances of printf() have been removed, search the map file for the project for the printf library. In the sim3u1xx_Blinky example, this means adding the statement to both the main.c and gCpu.c files. Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package, Table 19 and Table 20 show the relative build sizes with the different printf() settings. This section does not include the CoreMark tests since printf is not part of the CoreMark benchmark. Table 19. Keil printf() Comparison—sim3u1xx_Blinky Debug Library Code (bytes) µVision MicroLIB with printf µVision MicroLIB without printf 2068 1392 Read Only Data Read-Write Data Zero-Initialized (bytes) (bytes) Data (bytes) 296 296 24 12 1536 1536 Table 20. Keil printf() Comparison—demo_si32UsbAudio Debug Library Code (bytes) µVision MicroLIB with printf µVision MicroLIB without printf 47264 39760 Read Only Data Read-Write Data Zero-Initialized (bytes) (bytes) Data (bytes) 3832 4312 Rev. 0.1 5208 5196 17972 17972 21 AN720 6.5. Toolchain Optimization Settings In addition to the library types, each toolchain has multiple optimization settings that can affect the resulting code size. In Keil µVision, the optimization settings are set using the following steps: 1. Right-click on the project_name in the Project window and select Options for Target ‘project_name’ or go to ProjectOptions for Target ‘project_name’. 2. Select the C/C++ tab. 3. Use the Optimization drop-down menu to set the project optimization setting. Figure 11 shows the optimization settings in the IDE. The available options are: Level 0: minimum optimization Level 1: restricted optimization, removing inline functions and unused static functions Level 2: high optimization Level 3: maximum optimization with aims to produce faster code or smaller code size than Level 2, depending on the options used In addition to the levels, µVision also has an Optimize for Time selection available below the Optimization dropdown menu. Declaring a variable as volatile will prevent the compiler from optimizing out the variable. More information on these optimization levels can be found on the Keil website (http://www.keil.com/support/man/ docs/uv4/uv4_dg_adscc.htm). Figure 11. Setting the Project Optimization in the µVision IDE Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package, Table 21 and Table 22 show the relative Debug build sizes with the different optimization level settings. Table 23 shows the CoreMark Debug build sizes, and Table 24 lists the CoreMark speed scores for these optimization levels. 22 Rev. 0.1 AN720 Table 21. Keil Toolchain Optimization Comparison—sim3u1xx_Blinky Debug Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision MicroLIB -O0 µVision MicroLIB -O0 (with Optimize for Time) µVision MicroLIB -O1 µVision MicroLIB -O1 (with Optimize for Time) µVision MicroLIB -O2 µVision MicroLIB -O2 (with Optimize for Time) µVision MicroLIB -O3 µVision MicroLIB -O3 (with Optimize for Time) 2068 2068 296 296 24 24 1536 1536 1704 1648 296 296 20 20 1536 1536 1616 1600 296 296 20 20 1536 1536 1604 1596 296 296 20 20 1536 1536 Table 22. Keil Toolchain Optimization Comparison—demo_si32UsbAudio Debug Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision MicroLIB -O0 µVision MicroLIB -O0 (with Optimize for Time) µVision MicroLIB -O1 µVision MicroLIB -O1 (with Optimize for Time) µVision MicroLIB -O2 µVision MicroLIB -O2 (with Optimize for Time) µVision MicroLIB -O3 µVision MicroLIB -O3 (with Optimize for Time) 47264 47264 3832 3832 5208 5208 17972 17972 38816 39924 3832 3832 5132 5132 17952 17952 36540 39840 3832 3832 5132 5132 17952 17952 36468 41532 3832 3832 5132 5132 17952 17952 Rev. 0.1 23 AN720 Table 23. Keil Toolchain Optimization Comparison—CoreMark Debug Size Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision MicroLIB -O0 µVision MicroLIB -O0 (with Optimize for Time) µVision MicroLIB -O1 µVision MicroLIB -O1 (with Optimize for Time) µVision MicroLIB -O2 µVision MicroLIB -O2 (with Optimize for Time) µVision MicroLIB -O3 µVision MicroLIB -O3 (with Optimize for Time) 11276 11276 636 636 156 156 3536 3536 9788 10136 616 616 140 140 3536 3536 9640 10684 616 616 140 140 3536 3536 9680 11500 616 616 140 140 3536 3536 Table 24. Keil Toolchain Optimization Comparison—CoreMark Debug Speed Library CoreMark Score µVision MicroLIB -O0 µVision MicroLIB -O0 (with Optimize for Time) µVision MicroLIB -O1 µVision MicroLIB -O1 (with Optimize for Time) µVision MicroLIB -O2 µVision MicroLIB -O2 (with Optimize for Time) µVision MicroLIB -O3 µVision MicroLIB -O3 (with Optimize for Time) CoreMark 1.0 : 69.402323 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK CoreMark 1.0 : 69.402323 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK 24 CoreMark 1.0 : 75.279256 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK CoreMark 1.0 : 75.206352 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK CoreMark 1.0 : 74.247855 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK CoreMark 1.0 : 87.277701 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK CoreMark 1.0 : 79.520321 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK CoreMark 1.0 : 102.697150 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK Rev. 0.1 AN720 6.6. Unused Code Removal Each file in a project becomes an object that is included. In other words, if any functions in a file are used, then the entire file is included by default. This can become an issue for a project using the si32HAL and only a few functions from each module. Removed (unused) functions can be viewed in the map files for the projects. The unused code removal feature is not automatically enabled in the Keil µVision IDE. To enable this feature: 1. Right-click on the project_name in the Project window and select Options for Target ‘project_name’ or go to ProjectOptions for Target ‘project_name’. 2. Select the C/C++ tab. 3. Use the One ELF Section per Function checkbox to enable or disable unused code removal. Figure 12. Setting the Remove Unused Code Option in the µVision IDE Using the sim3u1xx_Blinky and demo_si32UsbAudio default examples in the si32HAL 1.0.1 software package, Table 25 and Table 26 show the relative Debug build sizes with different unused code removal settings. Table 27 shows the CoreMark build sizes, and Table 28 shows the CoreMark scores for the different unused code removal settings. Rev. 0.1 25 AN720 Table 25. Keil Unused Code Removal Comparison—sim3u1xx_Blinky Debug Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision MicroLIB with no unused code removal µVision MicroLIB with unused code removal 1392 296 12 1536 1184 296 12 1536 Table 26. Keil Unused Code Removal Comparison—demo_si32UsbAudio Debug Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision MicroLIB with no unused code removal µVision MicroLIB with unused code removal 47264 3832 5208 17972 43464 3772 5060 17780 Table 27. Keil Unused Code Removal Comparison—CoreMark Debug Size Library Code (bytes) Read Only Data (bytes) Read-Write Data (bytes) Zero-Initialized Data (bytes) µVision MicroLIB with no unused code removal µVision MicroLIB with unused code removal 11276 636 156 3536 11012 636 156 3536 Table 28. Keil Unused Code Removal Comparison—CoreMark Debug Speed 26 Library CoreMark Score µVision MicroLIB with no unused code removal µVision MicroLIB with unused code removal CoreMark 1.0 : 69.402324 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK CoreMark 1.0 : 67.374626 / ARM4.2 (EDG gcc mode) Iterations=3000 / STACK Rev. 0.1 AN720 6.7. Reset Sequence The speed of the reset sequence of a device can be an important factor, especially for devices like the SiM3U1xx/ SiM3C1xx that require a reset to exit the lowest power mode. After the hardware jumps to the reset vector and loads the stack pointer address, the core must initialize the memory of the device. This involves copying data from flash to RAM and zero-filling any zero-initialized segments. Then, the reset code typically calls a system initialization function and jumps to main. This reset sequence may take different times based on the library used with the project. The startup code should always be compiled with the fastest speed optimization to ensure it takes as little time as possible. The si32HAL examples have a ~500 ms delay added to a pin reset event to prevent code from switching to a nonexistent clock source and disable the device. This delay can be removed by defining the si32HalOption_disable_pin_reset_delay symbol in the project. To define a symbol in Keil µVision: 1. Right-click on the project_name in the Project window and select Options for Target ‘project_name’ or go to ProjectOptions for Target ‘project_name’. 2. Select the C/C++ tab. 3. Use the Define text box to add or remove project symbols. Figure 13. Adding a Project Define Symbol in the µVision IDE Table 29 shows the reset time comparison for the toolchain libraries using the fastest speed optimization on the start up code. This time was measured using the sim3u1xx_Blinky example in Debug mode from the rise of RESETb to the fall of a port pin at the beginning of main() on an oscilloscope. Table 29. Keil Toolchain Library Usage Comparison—sim3u1xx_Blinky Debug Reset Sequence Library Reset Time (µs) µVision standard µVision MicroLIB 52 48 Rev. 0.1 27 AN720 CONTACT INFORMATION Silicon Laboratories Inc. 400 West Cesar Chavez Austin, TX 78701 Tel: 1+(512) 416-8500 Fax: 1+(512) 416-9669 Toll Free: 1+(877) 444-3032 Please visit the Silicon Labs Technical Support web page: https://www.silabs.com/support/pages/contacttechnicalsupport.aspx and register to submit a technical support request. Patent Notice Silicon Labs invests in research and development to help our customers differentiate in the market with innovative low-power, small size, analogintensive mixed-signal solutions. Silicon Labs' extensive patent portfolio is a testament to our unique approach and world-class engineering team. The information in this document is believed to be accurate in all respects at the time of publication but is subject to change without notice. Silicon Laboratories assumes no responsibility for errors and omissions, and disclaims responsibility for any consequences resulting from the use of information included herein. Additionally, Silicon Laboratories assumes no responsibility for the functioning of undescribed features or parameters. Silicon Laboratories reserves the right to make changes without further notice. Silicon Laboratories makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose, nor does Silicon Laboratories assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. Silicon Laboratories products are not designed, intended, or authorized for use in applications intended to support or sustain life, or for any other application in which the failure of the Silicon Laboratories product could create a situation where personal injury or death may occur. Should Buyer purchase or use Silicon Laboratories products for any such unintended or unauthorized application, Buyer shall indemnify and hold Silicon Laboratories harmless against all claims and damages. Silicon Laboratories and Silicon Labs are trademarks of Silicon Laboratories Inc. Other products or brandnames mentioned herein are trademarks or registered trademarks of their respective holders. 28 Rev. 0.1