AN4777 Application note Implications of memory interface configurations on STM32L1 and STM32L0 Series microcontrollers Introduction The low-power STM32L1 and STM32L0 Series devices have a rich variety of configuration options regarding the Flash memory interface. This application note showcases the different settings under various test conditions, providing guidelines for the optimization of the application power consumption. Reference documents The reference documents are available on STMicroelectronics web site www.st.com: • Ultra-low-power STM32L0x3 advanced ARM®-based 32-bit MCUs Reference Manual (RM0367) • STM32L100xx, STM32L151xx, STM32L152xx and STM32L162xx advanced ARM®based 32-bit MCUs Reference Manual (RM0038) January 2016 DocID028482 Rev 1 1/22 www.st.com 1 Contents AN4777 Contents 1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Operation modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 5 3.1 STM32L1 Series device options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 STM32L0 Series device options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3 Execution from a volatile memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Power consumption and performance comparison using STM32L1 Series devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1 Dhrystone benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 32-bit instruction code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 4.3 Memory read stress test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Power consumption and performance comparison using STM32L0 Series devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.1 Dhrystone benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.2 Memory read stress test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7 Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2/22 DocID028482 Rev 1 AN4777 List of tables List of tables Table 1. Table 2. Table 3. Table 4. Table 5. Table 6. Table 7. Table 8. Table 9. Table 10. Table 11. Table 12. Table 13. Table 14. List of acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Configurations available on STM32L1 Series devices with regulator range 1 . . . . . . . . . . . 6 Configurations available on STM32L0 Series devices with regulator range 1 . . . . . . . . . . . 6 Dhrystone results with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Dhrystone results with DMA simultaneously reading data from the Flash memory . . . . . . . 9 32-bit code result with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 32-bit code result with DMA simultaneously reading data from the Flash memory . . . . . . 12 Literal pool with no additional data read from the Flash memory . . . . . . . . . . . . . . . . . . . . 13 Literal pool reading with DMA simultaneously reading the Flash memory . . . . . . . . . . . . . 14 Dhrystone with no additional data read from the Flash memory. . . . . . . . . . . . . . . . . . . . . 15 Dhrystone with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . 16 Literal pool with no additional data read from the Flash memory . . . . . . . . . . . . . . . . . . . . 18 Literal pool with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . 19 Document revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 DocID028482 Rev 1 3/22 3 List of figures AN4777 List of figures Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Figure 9. Figure 10. 4/22 Dhrystone results with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Dhrystone results with DMA simultaneously reading data from the Flash memory . . . . . . 10 32-bit code result with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 32-bit code result with DMA simultaneously reading data from the Flash memory . . . . . . 12 Literal pool reading with no additional data read from the Flash memory . . . . . . . . . . . . . 13 Literal pool reading with DMA simultaneously reading data from the Flash memory . . . . . 14 Dhrystone with no additional data read from the Flash memory. . . . . . . . . . . . . . . . . . . . . 16 Dhrystone with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . 17 Literal pool with no additional data read from the Flash memory . . . . . . . . . . . . . . . . . . . . 18 Literal pool with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . 19 DocID028482 Rev 1 AN4777 1 Definitions Definitions Table 1. List of acronyms Term 2 Description NV Non-volatile (memory), also referred as Flash memory HSI High-speed internal clock SPI Serial peripheral Interface bus MCU Microcontroller CPU Central processing unit (part of the MCU) NVIC Nested vector interrupt controller DMA Direct memory access RM Reference manual SWD Single wire debug interface System architecture The memory interface manages the read and write accesses from the core/bus matrix towards the non-volatile memory. This holds for both the instruction and data access. For configuring the non-volatile memory read access during the program execution, the configuration flags are accessible in the access control register. The latency serves the purpose of reducing the rate at which the NVM is read. An extra wait cycle must be enabled for a system clock higher than 16 MHz for the highest voltage regulator range. For lower core voltages this threshold frequency goes lower. To compensate this bandwidth deficiency, a prefetch can be configured. The memory controller will then attempt to have the next instruction ready before the core requests it. The STM32L1 memory interface can also use 64-bits read access internally to be able to serve the core with data and instruction close to its own space. The extra 32 bits can be used by the prefetch to load the next instruction and provide it to the core immediately when needed. The STM32L0 memory interface does not have the 64-bit wide bus, but the memory controller is capable of data pre-read. This simple buffer is similar to prefetch, but works also for data. All the performance improvements resulting from the memory interface settings come at a cost of an increased power consumption. 32-bit access with no latency, no pre-read and no prefetch is considered as a low-power mode. The following section sheds light on the kind of tradeoffs they represent. DocID028482 Rev 1 5/22 21 Operation modes 3 AN4777 Operation modes The following operation modes are used to assess the impact of the memory interface settings on the performance and power consumption. All the measurements have been done using VCC = 3.3 V and the voltage regulator range 1. The speed and consumption would be lower using lower regulator levels, but linearly lower relative to range 1 measurements. For example with the voltage regulator range 3 and system clock speed at 2 MHz (from MSI) the power consumption would be roughly 10 times lower for all the measurements and the performance roughly 10 times lower for all the measured configurations. There is no point in repeating the measurement for all the configuration combinations. 3.1 STM32L1 Series device options Table 2 lists a short summary of the device options. For a detailed description refer to read interface section of RM0038 reference manual. Table 2. Configurations available on STM32L1 Series devices with regulator range 1 Frequency <16 MHz >16 MHz Latency 0 0 0 1 1 1 1 64-bit 0 1 1 1 1 1 1 Prefetch 0 0 1 0 1 0 1 The table of valid configurations is clearly demonstrating the following simple rules: 3.2 • It is impossible to turn off the latency with clock speeds exceeding 16 MHz. • When the latency is set to 1 the 64-bit access is mandatory. • Prefetch is impossible without the 64-bit access. STM32L0 Series device options Table 3 lists a short summary of the device options. For a detailed description refer to reading the NVM section of RM0367 reference manual. Table 3. Configurations available on STM32L0 Series devices with regulator range 1 Frequency 6/22 <16 MHz >16 MHz Latency 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 Pre-read 0 0 1 1 1 1 0 0 X X 0 1 1 0 X Prefetch 0 0 0 0 1 1 1 1 X X 0 0 1 1 X Buffer disable 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 DocID028482 Rev 1 AN4777 Operation modes The table of valid configurations is clearly demonstrating the following simple rules: 3.3 • It is impossible to turn off the latency with clock speeds exceeding 16 MHz. • When the buffer is disabled, it cannot be configured. • Prefetch and pre-read configure the usage of the 6 words in the internal buffer, not their total amount. Execution from a volatile memory The intuitive way to avoid the Flash memory speed issues would be to use RAM for selected portions of code. There are several reasons not to do that. 1. RAM is a scarce resource on small devices. 2. Most of data are likely to be placed in RAM, accessing the code in RAM eliminates the advantage of Harvard architecture approach in STM32L1. 3. To switch off the Flash memory and conserve more energy, also the interrupt table and interrupt handlers need to be in RAM. In case of a typical microcontroller application, the overall energy budget of RAM execution is roughly the same as the execution on 32 MHz system clock with the Flash memory latency set. Which means that if the Flash memory can run without the latency enabled, it will be a better option most of the time. In other words the RAM execution tends to be about 30% slower than the execution of the same code from the Flash memory and the current consumption should not decrease more than the same 30% range. DocID028482 Rev 1 7/22 21 Power consumption and performance comparison using STM32L1 Series devices 4 AN4777 Power consumption and performance comparison using STM32L1 Series devices To assess the performance of the MCU with different memory controller settings, several benchmark tests have been used. All the tests have been executed on a STM32L152RC Discovery board using all the available memory interface settings, listed in Section 3.1. All the tests have been executed both standalone and in parallel with DMA transfer constantly reading from the program NV memory. The DMA channel was directed to the SPI output configured to the highest available speed (fPCLK/2) and low priority. Three clock configurations have been used in the measurements. One with the plain 16 MHz HSI clock as the system clock and no latency set, another with the same clock but the Flash latency configured (Flash memory running effectively on lower clock) and the third with PLL set to produce the 32 MHz system clock. All the measurements were taken on a single sample of Discovery board at ambient temperature. The values provided are an arithmetic mean from several measurements. 4.1 Dhrystone benchmark Although the Dhrystone benchmark is often deemed outdated, it is still somewhat representative of many microcontroller applications. Table 4. Dhrystone results with no background transfer Frequency 8/22 16 MHz 32 MHz Latency 0 0 0 1 1 1 1 64-bit 0 1 1 1 1 1 1 Prefetch 0 0 1 0 1 0 1 Timing for 50000 cycles [s] 2.57 2.57 2.57 3.05 2.86 1.52 1.46 Average current [mA] 5.75 5.78 6.11 5.13 5.62 10.42 11.08 Energy [mJ] 49.02 51.82 51.63 53.04 52.27 53.38 48.77 DocID028482 Rev 1 AN4777 Power consumption and performance comparison using STM32L1 Series devices Figure 1. Dhrystone results with no background transfer 0+]E SUHIHWFK 0+]EDFFHVVZLWKRXWSUHIHWFK ,>P$@ 0+]SUHIHWFKDQGEDFFHVV 0+]ODWHQF\ EDQGSUHIHWFK 0+]EDFFHVVQRSUHIHWFK 0+]QRSUHIHWFKERII 0+]ODWHQF\DQGEDFFHVV WLPH>V@ 06Y9 Table 5. Dhrystone results with DMA simultaneously reading data from the Flash memory Frequency 16 MHz 32 MHz Latency 0 0 0 1 1 1 1 64-bit 0 1 1 1 1 1 1 Prefetch 0 0 1 0 1 0 1 Timing for 50000 cycles [s] 2.72 2.68 2.68 3.28 3.09 1.64 1.55 Average current [mA] 6.17 6.25 6.58 5.50 5.99 11.24 11.68 Energy [mJ] 55.38 55.28 58.19 59.53 61.08 60.83 59.74 DocID028482 Rev 1 9/22 21 Power consumption and performance comparison using STM32L1 Series devices AN4777 Figure 2. Dhrystone results with DMA simultaneously reading data from the Flash memory 0+]EDQGSUHIHWFK 0+]EDFFHVVDQGQR SUHIHWFK ,>P$@ 0+]ODWHQF\RQE DFFHVDQGSUHIHWFKDFWLYH 0+]EDFFHVVDQGSUHIHWFK 0+]EDFFHVVQRSUHIHWFK 0+]ERWKSUHIHWFKDQGE $FFHVVRII 0+]ODWHQF\DQGE DFFHVVVHWQRSUHIHWFK WLPH>V@ 06Y9 Configuring 64-bit access or prefetch makes a very small difference on a low clock speed where the latency can be avoided. On the contrary, setting the latency may lead to a lower power consumption in situations where the speed is not critical. At higher speeds the efficiency of prefetch is situational, leading to ultimate performance but the gain in speed may be lower than the consumption increase. 10/22 DocID028482 Rev 1 AN4777 32-bit instruction code A stress test consists of executing 12 aligned 32-bit instructions manipulating data in registers in a loop of 500000 cycles. The code with a higher ratio of 32-bit instructions is more likely to find a bottleneck in the memory interface than a typical Thumb code with prevalent 16-bit instructions. Table 6. 32-bit code result with no background transfer Frequency 16 MHz 32 MHz Latency 0 0 0 1 1 1 1 64-bit 0 1 1 1 1 1 1 Prefetch 0 0 1 0 1 0 1 Timing for 500000 cycles [s] 0.9 0.9 0.9 1.06 0.964 0.59 0.497 Average current [mA] 5.25 5.41 5.63 4.82 5.11 9.09 9.78 Energy [mJ] 16.07 16.72 16.86 16.26 17.70 16.04 15.59 Figure 3. 32-bit code result with no background transfer 0+]EDQGSUHIHWFKRQ 0+]SUHIHWFKRII ,>P$@ 4.2 Power consumption and performance comparison using STM32L1 Series devices 0+]EDQGSUHIHWFKDFWLYH 0+]EDFFHVVQRSUHIHWFK 0+]QREDFFHVVQRU SUHIHWFKXVHG 0+]ZLWKODWHQF\DQG EDFFHVVDFWLYDWHG 0+]ZLWKODWHQF\EDFFHVV DQGSUHIHWFKDOODFWLYH WLPH>V@ 06Y9 DocID028482 Rev 1 11/22 21 Power consumption and performance comparison using STM32L1 Series devices AN4777 Table 7. 32-bit code result with DMA simultaneously reading data from the Flash memory Frequency 16 MHz 32 MHz Latency 0 0 0 1 1 1 1 64bit 0 1 1 1 1 1 1 Prefetch 0 0 1 0 1 0 1 Timing for 500000 cycles [s] 0.956 0.921 0.916 1.22 1.02 0.64 0.54 Average current [mA] 5.85 5.96 6.18 5.20 5.67 9.83 10.66 Energy [mJ] 18.46 18.11 18.68 20.94 19.09 20.76 19.00 Figure 4. 32-bit code result with DMA simultaneously reading data from the Flash memory 0+]SUHIHWFKHQDEOHG 0+]SUHIHWFKGLVDEOHG ,>P$@ 0+]ZLWKEDFFHVVDQGSUHIHWFK 0+]ZLWKEDFFHVVQRSUHIHWFK 0+]ODWHQF\DQGEDFFHVV DFWLYHQRSUHIHWFK 0+]QREDFFHVVQRSUHIHWFK 0+]ODWHQF\DFWLYHDORQJZLWK EDFFHVVDQGSUHIHWFK WLPH>V@ 06Y9 The findings are in line with the expectations: a code with high share of 32-bit instructions benefits a lot from prefetch once the memory latency is in place. But with zero latency the extra bandwidth is likely to be useless. 12/22 DocID028482 Rev 1 AN4777 4.3 Power consumption and performance comparison using STM32L1 Series devices Memory read stress test A stress test consists of executing 20 LDR instructions fetching data from the program NV memory to the CPU core registers in a loop of 500000 cycles. This way, not only the instructions are fetched from the memory but another read access is generated during the instruction execution, again creating a choke point at the memory interface. Fetching of subsequent instruction is then likely to be delayed. The code simulates a case when a heavy load of literal pools (string constants) like for example predefined messages, is read from a non-volatile memory very often. The memory reading by LDM instructions was not used as it is not demonstrating limits of the memory interface, only the memory itself. Table 8. Literal pool with no additional data read from the Flash memory Frequency 16 MHz 32 MHz Latency 0 0 0 1 1 1 1 64-bit 0 1 1 1 1 1 1 Prefetch 0 0 1 0 1 0 1 Timing for 500000 cycles [s] 3.66 2.73 2.72 3.38 3.32 1.69 1.66 Average current [mA] 5.44 5.58 6.12 4.85 5.33 9.78 10.73 Energy [mJ] 50.27 54.93 54.10 58.40 54.54 58.78 65.70 Figure 5. Literal pool reading with no additional data read from the Flash memory 0+]EDQGSUHIHWFK 0+]ZLWKRXWSUHIHWFK ,>P$@ Note: 0+]EDQGSUHIHWFK 0+]ZRE DFFHVV 0+]ZLWKRXWSUHIHWFK 0+]ZLWKODWHQF\E DQGSUHIHWFK 0+]ZLWKEDQG ODWHQF\RI WLPH>V@ 06Y9 DocID028482 Rev 1 13/22 21 Power consumption and performance comparison using STM32L1 Series devices AN4777 Table 9. Literal pool reading with DMA simultaneously reading the Flash memory Frequency 16 MHz 32 MHz Latency 0 0 0 1 1 1 1 64-bit 0 1 1 1 1 1 1 Prefetch 0 0 1 0 1 0 1 Timing for 500000 cycles [s] 3.98 2.94 2.94 3.92 3.88 1.97 1.96 Average current [mA] 6.04 6.26 6.73 5.40 5.72 10.62 11.59 Energy [mJ] 60.73 65.29 69.85 73.24 69.04 74.96 79.33 Figure 6. Literal pool reading with DMA simultaneously reading data from the Flash memory 0+]EDQGSUHIHWFK 0+]ZLWKRXWSUHIHWFK ,>P$@ 0+]EDQGSUHIHWFK 0+]ZREDFFHVV 0+]ZLWKRXWSUHIHWFK 0+]ZLWKODWHQF\EDQG SUHIHWFK 0+]ZLWKODWHQF\ DQGE WLPH>V@ 06Y9 As expected, in case of mostly data read transfer the effect of prefetch is lower, but a 64-bit memory access makes a significant difference even with zero memory latency. 14/22 DocID028482 Rev 1 AN4777 5 Power consumption and performance comparison using STM32L0 Series devices Power consumption and performance comparison using STM32L0 Series devices The Cortex-M0+ core is much simpler compared to the Cortex-M3 used in the STM32L1 Series. The 32-bit instruction benchmark is dropped as the Thumb-2 instruction set support in the M0+ core is very limited and an extensive usage of 32-bit code is not realistic with a code compiled for the STM32L0 Series. The remaining tests have been executed on a STM32L073-EVAL board using all the available memory interface settings, listed in Section 3.2. All the tests have been executed both standalone and in parallel with DMA transfer constantly reading from the program NV memory. The DMA channel was directed to the SPI output configured to the highest available speed (fPCLK/2), but low priority. Two clock configurations have been used in the measurements. One with the plain 16 MHz HSI clock as system clock and no latency set, the other with PLL set to produce 32 MHz system clock and of course the Flash memory latency set to 1. All the measurements are taken on a single sample of Nucleo board at ambient temperature. The values provided are an arithmetic mean from several measurements. 5.1 Dhrystone benchmark The Dhrystone code is executed and the task consists of processing 50000 cycles of the test code. Table 10. Dhrystone with no additional data read from the Flash memory Frequency 16 MHz 32 MHz Latency 0 0 0 0 0 1 1 1 1 1 Prefetch 1 0 0 1 0 1 0 0 1 0 Pre-read 1 1 0 0 0 1 1 0 0 0 Disabled buffer 0 0 1 0 0 0 0 1 0 0 Time [ms] 3769 3766 3771 3769 3769 2139 2667 2720 2130 2667 Average current [mA] 4.32 4.42 4.54 4.40 4.39 8.14 7.52 7.52 8.04 7.43 Energy [mJ] 53.73 54.93 56.49 54.72 54.60 57.46 66.20 67.49 56.51 65.40 DocID028482 Rev 1 15/22 21 Power consumption and performance comparison using STM32L0 Series devices AN4777 Figure 7. Dhrystone with no additional data read from the Flash memory 0+]SUHUHDGDQGSUHIHWFK 0+]SUHUHDGRQO\ 0+]EXIIHUGLVDEOHG 0+]SUHIHWFKRQO\ 0+]QRSUHUHDGRUSUHIHWFK ,>P$@ 0+]EXIIHUGLVDEOHG 0+]SUHUHDGRQO\ 0+]SUHIHWFKRQO\ 0+]QRSUHUHDGRUSUHIHWFK 0+]SUHIHWFKDQGSUHUHDG WLPH>PV@ 06Y9 Table 11. Dhrystone with DMA simultaneously reading data from the Flash memory Frequency 16/22 16 MHz 32 MHz Latency 0 0 0 0 0 1 1 1 1 1 Prefetch 1 0 0 1 0 1 0 0 1 0 Pre-read 1 1 0 0 0 1 1 0 0 0 Disabled buffer 0 0 1 0 0 0 0 1 0 0 Time [ms] 3903 3901 3906 3906 3904 2377 2853 2956 2334 2843 Average current [mA] 4.69 4.77 4.87 4.68 4.59 8.58 8.21 8.15 8.66 7.80 Energy [mJ] 69.40 61.41 62.77 60.32 59.13 67.29 77.31 79.31 66.70 73.17 DocID028482 Rev 1 AN4777 Power consumption and performance comparison using STM32L0 Series devices Figure 8. Dhrystone with DMA simultaneously reading data from the Flash memory 0+]SUHIHWFKRQO\ 0+]SUHUHDGDQGSUHIHWFK 0+]SUHUHDG\RQO\ 0+]EXIIHUGLVDEOHG 0+]QRSUHUHDGRUSUHIHWFK ,>P$@ 0+]EXIIHUGLVDEOHG 0+]SUHUHDGRQO\ 0+]SUHUHDGDQGSUHIHWFK 0+]SUHIHWFKRQO\ 0+]QRSUHIHWFKRUSUHUHDG WLPH>PV@ 06Y9 It can be clearly seen that the internal 6 word buffer improves the energy efficiency even if it is not well utilized, like in case of zero latency. The best option is to keep it on, but to disable prefetch and pre-read. In case of the configuration with the latency is enabled, the prefetch is probably worth using. Pre-read is obviously not used by the DMA channel and does not represent an improvement. 5.2 Memory read stress test A stress test consists of executing 20 LDR instructions fetching data from program NV memory to CPU core registers in a loop of 500000 cycles. This way, not only the instructions are fetched from the memory but another read access is generated during the instruction execution, again creating a choke point at the memory interface. Fetching of subsequent instruction is then likely to be delayed. The code simulates a case when a heavy load of literal pools, like for example predefined messages, is read from a non-volatile memory very often. Note: The memory reading by LDM instructions was not used as it is not demonstrating limits of the memory interface, only the memory itself. DocID028482 Rev 1 17/22 21 Power consumption and performance comparison using STM32L0 Series devices AN4777 Table 12. Literal pool with no additional data read from the Flash memory Frequency 16 MHz 32 MHz Latency 0 0 0 0 0 1 1 1 1 1 Prefetch 1 0 0 1 0 1 0 0 1 0 Pre-read 1 1 0 0 0 1 1 0 0 0 Disabled buffer 0 0 1 0 0 0 0 1 0 0 Time [ms] 2402.5 2401.5 2403 2403 2399.5 2009 2058.5 2091 1817 1819 Average current [mA] 3.4 3.42 3.36 3.14 3.19 6.05 5.83 5.73 Energy [mJ] 26.95 27.10 26.64 24.89 25.25 6.03 39.97 41.09 5.94 40.98 34.95 34.39 Figure 9. Literal pool with no additional data read from the Flash memory 0+]ERWKSUHUHDG DQGSUHIHWFKRQ 0+]SUHIHWFKRQO\ 0+]SUHUHDGRQO\ 0+]EXIIHUGLVDEOHG 0+]QRSUHUHDG RUSUHIHWFK ,>P$@ 0+]SUHUHDGRQO\ 0+]SUHIHWFKDQGSUHUHDG 0+]EXIIHUGLVDEOHG 0+]QRSUHUHDGRUSUHIHWFK 0+]SUHIHWFKRQO\ WLPH>PV@ 06Y9 18/22 DocID028482 Rev 1 AN4777 Power consumption and performance comparison using STM32L0 Series devices Table 13. Literal pool with DMA simultaneously reading data from the Flash memory Frequency 16 MHz 32 MHz Latency 0 0 0 0 0 1 1 1 1 1 Prefetch 1 0 0 1 0 1 0 0 1 0 Pre-read 1 1 0 0 0 1 1 0 0 0 Disabled buffer 0 0 1 0 0 0 0 1 0 0 Time [ms] 2533.5 2533.5 4854.5 4587 4591 2292.5 2301 2420 2299 2302.5 Average current [mA] 3.86 3.86 3.38 3.32 3.29 7.42 7.39 7.34 7.25 7.18 Energy [mJ] 32.27 32.27 54.15 50.26 49.84 56.13 56.11 58.62 55.00 54.56 Figure 10. Literal pool with DMA simultaneously reading data from the Flash memory 0+]SUHUHDGRQO\ 0+]SUHIHWFKDQGSUHUHDG 0+]EXIIHUGLVDEOHG 0+]RQO\SUHIHWFK 0+]QRSUHUHDGRUSUHIHWFK ,>P$@ 0+]SUHUHDGGLVDEOHG 0+]SUHUHDGDFWLYH 0+]EXIIHUGLVDEOHG WLPH>PV@ 06Y9 This example finally demonstrates the advantage of pre-read setting. It can greatly improve the efficiency when more than one stream of data is read from the Flash memory and there is no latency. Prefetch is not useful when dealing mostly with data, that is no surprise. Again it is a good idea to keep the buffer enabled. The only reason to disable the buffer is if the timing needs to be more deterministic, whatever the efficiency cost may be. DocID028482 Rev 1 19/22 21 Conclusion 6 AN4777 Conclusion The measured results provide the guidance for decision whether or not to enable the different memory interface settings. The features improving the benchmark result also lead to a higher power consumption and the overall efficiency is dependent on the task processed by the microcontroller. There is no significant benefit in tweaking the settings when the Flash memory latency is not in place. This makes sense only if the Flash memory contains frequently used literal pools (predefined data constants). With the Flash memory latency equals to one, the Flash interface should be setup carefully, as the performance difference between the optimal and default configuration may be significant. It is definitely possible to activate some Flash interface settings only temporarily for particular operations and disable them afterwards. 20/22 DocID028482 Rev 1 AN4777 7 Revision history Revision history Table 14. Document revision history Date Revision 19-Jan-2016 1 Changes Initial release. DocID028482 Rev 1 21/22 21 AN4777 IMPORTANT NOTICE – PLEASE READ CAREFULLY STMicroelectronics NV and its subsidiaries (“ST”) reserve the right to make changes, corrections, enhancements, modifications, and improvements to ST products and/or to this document at any time without notice. Purchasers should obtain the latest relevant information on ST products before placing orders. ST products are sold pursuant to ST’s terms and conditions of sale in place at the time of order acknowledgement. Purchasers are solely responsible for the choice, selection, and use of ST products and ST assumes no liability for application assistance or the design of Purchasers’ products. No license, express or implied, to any intellectual property right is granted by ST herein. Resale of ST products with provisions different from the information set forth herein shall void any warranty granted by ST for such product. ST and the ST logo are trademarks of ST. All other product or service names are the property of their respective owners. Information in this document supersedes and replaces information previously supplied in any prior versions of this document. © 2016 STMicroelectronics – All rights reserved 22/22 DocID028482 Rev 1