dm00237631

AN4777
Application note
Implications of memory interface configurations on STM32L1 and
STM32L0 Series microcontrollers
Introduction
The low-power STM32L1 and STM32L0 Series devices have a rich variety of configuration
options regarding the Flash memory interface.
This application note showcases the different settings under various test conditions,
providing guidelines for the optimization of the application power consumption.
Reference documents
The reference documents are available on STMicroelectronics web site www.st.com:
• Ultra-low-power STM32L0x3 advanced ARM®-based 32-bit MCUs Reference Manual
(RM0367)
• STM32L100xx, STM32L151xx, STM32L152xx and STM32L162xx advanced ARM®based 32-bit MCUs Reference Manual (RM0038)
January 2016
DocID028482 Rev 1
1/22
www.st.com
1
Contents
AN4777
Contents
1
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2
System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3
Operation modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4
5
3.1
STM32L1 Series device options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2
STM32L0 Series device options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3
Execution from a volatile memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Power consumption and performance comparison
using STM32L1 Series devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1
Dhrystone benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2
32-bit instruction code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
4.3
Memory read stress test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Power consumption and performance comparison
using STM32L0 Series devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1
Dhrystone benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2
Memory read stress test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7
Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2/22
DocID028482 Rev 1
AN4777
List of tables
List of tables
Table 1.
Table 2.
Table 3.
Table 4.
Table 5.
Table 6.
Table 7.
Table 8.
Table 9.
Table 10.
Table 11.
Table 12.
Table 13.
Table 14.
List of acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Configurations available on STM32L1 Series devices with regulator range 1 . . . . . . . . . . . 6
Configurations available on STM32L0 Series devices with regulator range 1 . . . . . . . . . . . 6
Dhrystone results with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Dhrystone results with DMA simultaneously reading data from the Flash memory . . . . . . . 9
32-bit code result with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
32-bit code result with DMA simultaneously reading data from the Flash memory . . . . . . 12
Literal pool with no additional data read from the Flash memory . . . . . . . . . . . . . . . . . . . . 13
Literal pool reading with DMA simultaneously reading the Flash memory . . . . . . . . . . . . . 14
Dhrystone with no additional data read from the Flash memory. . . . . . . . . . . . . . . . . . . . . 15
Dhrystone with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . 16
Literal pool with no additional data read from the Flash memory . . . . . . . . . . . . . . . . . . . . 18
Literal pool with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . 19
Document revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
DocID028482 Rev 1
3/22
3
List of figures
AN4777
List of figures
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
Figure 10.
4/22
Dhrystone results with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Dhrystone results with DMA simultaneously reading data from the Flash memory . . . . . . 10
32-bit code result with no background transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
32-bit code result with DMA simultaneously reading data from the Flash memory . . . . . . 12
Literal pool reading with no additional data read from the Flash memory . . . . . . . . . . . . . 13
Literal pool reading with DMA simultaneously reading data from the Flash memory . . . . . 14
Dhrystone with no additional data read from the Flash memory. . . . . . . . . . . . . . . . . . . . . 16
Dhrystone with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . . 17
Literal pool with no additional data read from the Flash memory . . . . . . . . . . . . . . . . . . . . 18
Literal pool with DMA simultaneously reading data from the Flash memory . . . . . . . . . . . 19
DocID028482 Rev 1
AN4777
1
Definitions
Definitions
Table 1. List of acronyms
Term
2
Description
NV
Non-volatile (memory), also referred as Flash memory
HSI
High-speed internal clock
SPI
Serial peripheral Interface bus
MCU
Microcontroller
CPU
Central processing unit (part of the MCU)
NVIC
Nested vector interrupt controller
DMA
Direct memory access
RM
Reference manual
SWD
Single wire debug interface
System architecture
The memory interface manages the read and write accesses from the core/bus matrix
towards the non-volatile memory. This holds for both the instruction and data access.
For configuring the non-volatile memory read access during the program execution, the
configuration flags are accessible in the access control register.
The latency serves the purpose of reducing the rate at which the NVM is read. An extra wait
cycle must be enabled for a system clock higher than 16 MHz for the highest voltage
regulator range. For lower core voltages this threshold frequency goes lower.
To compensate this bandwidth deficiency, a prefetch can be configured. The memory
controller will then attempt to have the next instruction ready before the core requests it.
The STM32L1 memory interface can also use 64-bits read access internally to be able to
serve the core with data and instruction close to its own space. The extra 32 bits can be
used by the prefetch to load the next instruction and provide it to the core immediately when
needed.
The STM32L0 memory interface does not have the 64-bit wide bus, but the memory
controller is capable of data pre-read. This simple buffer is similar to prefetch, but works also
for data.
All the performance improvements resulting from the memory interface settings come at a
cost of an increased power consumption. 32-bit access with no latency, no pre-read and no
prefetch is considered as a low-power mode. The following section sheds light on the kind of
tradeoffs they represent.
DocID028482 Rev 1
5/22
21
Operation modes
3
AN4777
Operation modes
The following operation modes are used to assess the impact of the memory interface
settings on the performance and power consumption. All the measurements have been
done using VCC = 3.3 V and the voltage regulator range 1. The speed and consumption
would be lower using lower regulator levels, but linearly lower relative to range 1
measurements. For example with the voltage regulator range 3 and system clock speed at
2 MHz (from MSI) the power consumption would be roughly 10 times lower for all the
measurements and the performance roughly 10 times lower for all the measured
configurations. There is no point in repeating the measurement for all the configuration
combinations.
3.1
STM32L1 Series device options
Table 2 lists a short summary of the device options. For a detailed description refer to read
interface section of RM0038 reference manual.
Table 2. Configurations available on STM32L1 Series devices with regulator range 1
Frequency
<16 MHz
>16 MHz
Latency
0
0
0
1
1
1
1
64-bit
0
1
1
1
1
1
1
Prefetch
0
0
1
0
1
0
1
The table of valid configurations is clearly demonstrating the following simple rules:
3.2
•
It is impossible to turn off the latency with clock speeds exceeding 16 MHz.
•
When the latency is set to 1 the 64-bit access is mandatory.
•
Prefetch is impossible without the 64-bit access.
STM32L0 Series device options
Table 3 lists a short summary of the device options. For a detailed description refer to
reading the NVM section of RM0367 reference manual.
Table 3. Configurations available on STM32L0 Series devices with regulator range 1
Frequency
6/22
<16 MHz
>16 MHz
Latency
0
1
0
1
0
1
0
1
0
1
1
1
1
1
1
Pre-read
0
0
1
1
1
1
0
0
X
X
0
1
1
0
X
Prefetch
0
0
0
0
1
1
1
1
X
X
0
0
1
1
X
Buffer
disable
0
0
0
0
0
0
0
0
1
1
0
0
0
0
1
DocID028482 Rev 1
AN4777
Operation modes
The table of valid configurations is clearly demonstrating the following simple rules:
3.3
•
It is impossible to turn off the latency with clock speeds exceeding 16 MHz.
•
When the buffer is disabled, it cannot be configured.
•
Prefetch and pre-read configure the usage of the 6 words in the internal buffer, not their
total amount.
Execution from a volatile memory
The intuitive way to avoid the Flash memory speed issues would be to use RAM for selected
portions of code. There are several reasons not to do that.
1.
RAM is a scarce resource on small devices.
2.
Most of data are likely to be placed in RAM, accessing the code in RAM eliminates the
advantage of Harvard architecture approach in STM32L1.
3.
To switch off the Flash memory and conserve more energy, also the interrupt table and
interrupt handlers need to be in RAM.
In case of a typical microcontroller application, the overall energy budget of RAM execution
is roughly the same as the execution on 32 MHz system clock with the Flash memory
latency set. Which means that if the Flash memory can run without the latency enabled, it
will be a better option most of the time. In other words the RAM execution tends to be about
30% slower than the execution of the same code from the Flash memory and the current
consumption should not decrease more than the same 30% range.
DocID028482 Rev 1
7/22
21
Power consumption and performance comparison using STM32L1 Series devices
4
AN4777
Power consumption and performance comparison
using STM32L1 Series devices
To assess the performance of the MCU with different memory controller settings, several
benchmark tests have been used. All the tests have been executed on a STM32L152RC
Discovery board using all the available memory interface settings, listed in Section 3.1. All
the tests have been executed both standalone and in parallel with DMA transfer constantly
reading from the program NV memory. The DMA channel was directed to the SPI output
configured to the highest available speed (fPCLK/2) and low priority.
Three clock configurations have been used in the measurements. One with the plain
16 MHz HSI clock as the system clock and no latency set, another with the same clock but
the Flash latency configured (Flash memory running effectively on lower clock) and the third
with PLL set to produce the 32 MHz system clock.
All the measurements were taken on a single sample of Discovery board at ambient
temperature. The values provided are an arithmetic mean from several measurements.
4.1
Dhrystone benchmark
Although the Dhrystone benchmark is often deemed outdated, it is still somewhat
representative of many microcontroller applications.
Table 4. Dhrystone results with no background transfer
Frequency
8/22
16 MHz
32 MHz
Latency
0
0
0
1
1
1
1
64-bit
0
1
1
1
1
1
1
Prefetch
0
0
1
0
1
0
1
Timing for 50000
cycles [s]
2.57
2.57
2.57
3.05
2.86
1.52
1.46
Average current [mA] 5.75
5.78
6.11
5.13
5.62
10.42
11.08
Energy [mJ]
49.02
51.82
51.63
53.04
52.27
53.38
48.77
DocID028482 Rev 1
AN4777
Power consumption and performance comparison using STM32L1 Series devices
Figure 1. Dhrystone results with no background transfer
0+]E
SUHIHWFK
0+]EDFFHVVZLWKRXWSUHIHWFK
,>P$@
0+]SUHIHWFKDQGEDFFHVV
0+]ODWHQF\
EDQGSUHIHWFK
0+]EDFFHVVQRSUHIHWFK
0+]QRSUHIHWFKERII
0+]ODWHQF\DQGEDFFHVV
WLPH>V@ 06Y9
Table 5. Dhrystone results with DMA simultaneously reading data from the Flash
memory
Frequency
16 MHz
32 MHz
Latency
0
0
0
1
1
1
1
64-bit
0
1
1
1
1
1
1
Prefetch
0
0
1
0
1
0
1
Timing for 50000
cycles [s]
2.72
2.68
2.68
3.28
3.09
1.64
1.55
Average current [mA]
6.17
6.25
6.58
5.50
5.99
11.24
11.68
Energy [mJ]
55.38
55.28
58.19
59.53
61.08
60.83
59.74
DocID028482 Rev 1
9/22
21
Power consumption and performance comparison using STM32L1 Series devices
AN4777
Figure 2. Dhrystone results with DMA simultaneously reading data from the Flash
memory
0+]EDQGSUHIHWFK
0+]EDFFHVVDQGQR
SUHIHWFK
,>P$@
0+]ODWHQF\RQE
DFFHVDQGSUHIHWFKDFWLYH
0+]EDFFHVVDQGSUHIHWFK
0+]EDFFHVVQRSUHIHWFK
0+]ERWKSUHIHWFKDQGE
$FFHVVRII
0+]ODWHQF\DQGE
DFFHVVVHWQRSUHIHWFK
WLPH>V@ 06Y9
Configuring 64-bit access or prefetch makes a very small difference on a low clock speed
where the latency can be avoided. On the contrary, setting the latency may lead to a lower
power consumption in situations where the speed is not critical. At higher speeds the
efficiency of prefetch is situational, leading to ultimate performance but the gain in speed
may be lower than the consumption increase.
10/22
DocID028482 Rev 1
AN4777
32-bit instruction code
A stress test consists of executing 12 aligned 32-bit instructions manipulating data in
registers in a loop of 500000 cycles. The code with a higher ratio of 32-bit instructions is
more likely to find a bottleneck in the memory interface than a typical Thumb code with
prevalent 16-bit instructions.
Table 6. 32-bit code result with no background transfer
Frequency
16 MHz
32 MHz
Latency
0
0
0
1
1
1
1
64-bit
0
1
1
1
1
1
1
Prefetch
0
0
1
0
1
0
1
Timing for 500000
cycles [s]
0.9
0.9
0.9
1.06
0.964
0.59
0.497
Average current [mA] 5.25
5.41
5.63
4.82
5.11
9.09
9.78
Energy [mJ]
16.07
16.72
16.86
16.26
17.70
16.04
15.59
Figure 3. 32-bit code result with no background transfer
0+]EDQGSUHIHWFKRQ
0+]SUHIHWFKRII
,>P$@
4.2
Power consumption and performance comparison using STM32L1 Series devices
0+]EDQGSUHIHWFKDFWLYH
0+]EDFFHVVQRSUHIHWFK
0+]QREDFFHVVQRU
SUHIHWFKXVHG
0+]ZLWKODWHQF\DQG
EDFFHVVDFWLYDWHG
0+]ZLWKODWHQF\EDFFHVV
DQGSUHIHWFKDOODFWLYH
WLPH>V@
06Y9
DocID028482 Rev 1
11/22
21
Power consumption and performance comparison using STM32L1 Series devices
AN4777
Table 7. 32-bit code result with DMA simultaneously reading data from the Flash
memory
Frequency
16 MHz
32 MHz
Latency
0
0
0
1
1
1
1
64bit
0
1
1
1
1
1
1
Prefetch
0
0
1
0
1
0
1
Timing for 500000
cycles [s]
0.956
0.921
0.916
1.22
1.02
0.64
0.54
Average current [mA]
5.85
5.96
6.18
5.20
5.67
9.83
10.66
Energy [mJ]
18.46
18.11
18.68
20.94
19.09
20.76
19.00
Figure 4. 32-bit code result with DMA simultaneously reading data from the Flash
memory
0+]SUHIHWFKHQDEOHG
0+]SUHIHWFKGLVDEOHG
,>P$@
0+]ZLWKEDFFHVVDQGSUHIHWFK
0+]ZLWKEDFFHVVQRSUHIHWFK
0+]ODWHQF\DQGEDFFHVV
DFWLYHQRSUHIHWFK
0+]QREDFFHVVQRSUHIHWFK
0+]ODWHQF\DFWLYHDORQJZLWK
EDFFHVVDQGSUHIHWFK
WLPH>V@
06Y9
The findings are in line with the expectations: a code with high share of 32-bit instructions
benefits a lot from prefetch once the memory latency is in place. But with zero latency the
extra bandwidth is likely to be useless.
12/22
DocID028482 Rev 1
AN4777
4.3
Power consumption and performance comparison using STM32L1 Series devices
Memory read stress test
A stress test consists of executing 20 LDR instructions fetching data from the program NV
memory to the CPU core registers in a loop of 500000 cycles. This way, not only the
instructions are fetched from the memory but another read access is generated during the
instruction execution, again creating a choke point at the memory interface. Fetching of
subsequent instruction is then likely to be delayed. The code simulates a case when a
heavy load of literal pools (string constants) like for example predefined messages, is read
from a non-volatile memory very often.
The memory reading by LDM instructions was not used as it is not demonstrating limits of
the memory interface, only the memory itself.
Table 8. Literal pool with no additional data read from the Flash memory
Frequency
16 MHz
32 MHz
Latency
0
0
0
1
1
1
1
64-bit
0
1
1
1
1
1
1
Prefetch
0
0
1
0
1
0
1
Timing for 500000
cycles [s]
3.66
2.73
2.72
3.38
3.32
1.69
1.66
Average current [mA] 5.44
5.58
6.12
4.85
5.33
9.78
10.73
Energy [mJ]
50.27
54.93
54.10
58.40
54.54
58.78
65.70
Figure 5. Literal pool reading with no additional data read from the Flash memory
0+]EDQGSUHIHWFK
0+]ZLWKRXWSUHIHWFK
,>P$@
Note:
0+]EDQGSUHIHWFK
0+]ZRE
DFFHVV
0+]ZLWKRXWSUHIHWFK
0+]ZLWKODWHQF\E
DQGSUHIHWFK
0+]ZLWKEDQG
ODWHQF\RI
WLPH>V@
06Y9
DocID028482 Rev 1
13/22
21
Power consumption and performance comparison using STM32L1 Series devices
AN4777
Table 9. Literal pool reading with DMA simultaneously reading the Flash memory
Frequency
16 MHz
32 MHz
Latency
0
0
0
1
1
1
1
64-bit
0
1
1
1
1
1
1
Prefetch
0
0
1
0
1
0
1
Timing for 500000
cycles [s]
3.98
2.94
2.94
3.92
3.88
1.97
1.96
Average current [mA] 6.04
6.26
6.73
5.40
5.72
10.62
11.59
Energy [mJ]
60.73
65.29
69.85
73.24
69.04
74.96
79.33
Figure 6. Literal pool reading with DMA simultaneously reading data from the Flash
memory
0+]EDQGSUHIHWFK
0+]ZLWKRXWSUHIHWFK
,>P$@
0+]EDQGSUHIHWFK
0+]ZREDFFHVV
0+]ZLWKRXWSUHIHWFK
0+]ZLWKODWHQF\EDQG
SUHIHWFK
0+]ZLWKODWHQF\
DQGE
WLPH>V@
06Y9
As expected, in case of mostly data read transfer the effect of prefetch is lower, but a 64-bit
memory access makes a significant difference even with zero memory latency.
14/22
DocID028482 Rev 1
AN4777
5
Power consumption and performance comparison using STM32L0 Series devices
Power consumption and performance comparison
using STM32L0 Series devices
The Cortex-M0+ core is much simpler compared to the Cortex-M3 used in the STM32L1
Series. The 32-bit instruction benchmark is dropped as the Thumb-2 instruction set support
in the M0+ core is very limited and an extensive usage of 32-bit code is not realistic with a
code compiled for the STM32L0 Series.
The remaining tests have been executed on a STM32L073-EVAL board using all the
available memory interface settings, listed in Section 3.2. All the tests have been executed
both standalone and in parallel with DMA transfer constantly reading from the program NV
memory. The DMA channel was directed to the SPI output configured to the highest
available speed (fPCLK/2), but low priority.
Two clock configurations have been used in the measurements. One with the plain 16 MHz
HSI clock as system clock and no latency set, the other with PLL set to produce 32 MHz
system clock and of course the Flash memory latency set to 1.
All the measurements are taken on a single sample of Nucleo board at ambient
temperature. The values provided are an arithmetic mean from several measurements.
5.1
Dhrystone benchmark
The Dhrystone code is executed and the task consists of processing 50000 cycles of the
test code.
Table 10. Dhrystone with no additional data read from the Flash memory
Frequency
16 MHz
32 MHz
Latency
0
0
0
0
0
1
1
1
1
1
Prefetch
1
0
0
1
0
1
0
0
1
0
Pre-read
1
1
0
0
0
1
1
0
0
0
Disabled buffer 0
0
1
0
0
0
0
1
0
0
Time [ms]
3769
3766
3771
3769
3769
2139
2667
2720
2130
2667
Average
current [mA]
4.32
4.42
4.54
4.40
4.39
8.14
7.52
7.52
8.04
7.43
Energy [mJ]
53.73
54.93
56.49
54.72
54.60
57.46
66.20
67.49
56.51
65.40
DocID028482 Rev 1
15/22
21
Power consumption and performance comparison using STM32L0 Series devices
AN4777
Figure 7. Dhrystone with no additional data read from the Flash memory
0+]SUHUHDGDQGSUHIHWFK
0+]SUHUHDGRQO\
0+]EXIIHUGLVDEOHG
0+]SUHIHWFKRQO\
0+]QRSUHUHDGRUSUHIHWFK
,>P$@
0+]EXIIHUGLVDEOHG
0+]SUHUHDGRQO\
0+]SUHIHWFKRQO\
0+]QRSUHUHDGRUSUHIHWFK
0+]SUHIHWFKDQGSUHUHDG
WLPH>PV@
06Y9
Table 11. Dhrystone with DMA simultaneously reading data from the Flash memory
Frequency
16/22
16 MHz
32 MHz
Latency
0
0
0
0
0
1
1
1
1
1
Prefetch
1
0
0
1
0
1
0
0
1
0
Pre-read
1
1
0
0
0
1
1
0
0
0
Disabled buffer 0
0
1
0
0
0
0
1
0
0
Time [ms]
3903
3901
3906
3906
3904
2377
2853
2956
2334
2843
Average
current [mA]
4.69
4.77
4.87
4.68
4.59
8.58
8.21
8.15
8.66
7.80
Energy [mJ]
69.40
61.41
62.77
60.32
59.13
67.29
77.31
79.31
66.70
73.17
DocID028482 Rev 1
AN4777
Power consumption and performance comparison using STM32L0 Series devices
Figure 8. Dhrystone with DMA simultaneously reading data from the Flash memory
0+]SUHIHWFKRQO\
0+]SUHUHDGDQGSUHIHWFK
0+]SUHUHDG\RQO\
0+]EXIIHUGLVDEOHG
0+]QRSUHUHDGRUSUHIHWFK
,>P$@
0+]EXIIHUGLVDEOHG
0+]SUHUHDGRQO\
0+]SUHUHDGDQGSUHIHWFK
0+]SUHIHWFKRQO\
0+]QRSUHIHWFKRUSUHUHDG
WLPH>PV@
06Y9
It can be clearly seen that the internal 6 word buffer improves the energy efficiency even if it
is not well utilized, like in case of zero latency. The best option is to keep it on, but to disable
prefetch and pre-read.
In case of the configuration with the latency is enabled, the prefetch is probably worth using.
Pre-read is obviously not used by the DMA channel and does not represent an
improvement.
5.2
Memory read stress test
A stress test consists of executing 20 LDR instructions fetching data from program NV
memory to CPU core registers in a loop of 500000 cycles. This way, not only the instructions
are fetched from the memory but another read access is generated during the instruction
execution, again creating a choke point at the memory interface. Fetching of subsequent
instruction is then likely to be delayed. The code simulates a case when a heavy load of
literal pools, like for example predefined messages, is read from a non-volatile memory very
often.
Note:
The memory reading by LDM instructions was not used as it is not demonstrating limits of
the memory interface, only the memory itself.
DocID028482 Rev 1
17/22
21
Power consumption and performance comparison using STM32L0 Series devices
AN4777
Table 12. Literal pool with no additional data read from the Flash memory
Frequency
16 MHz
32 MHz
Latency
0
0
0
0
0
1
1
1
1
1
Prefetch
1
0
0
1
0
1
0
0
1
0
Pre-read
1
1
0
0
0
1
1
0
0
0
Disabled buffer
0
0
1
0
0
0
0
1
0
0
Time [ms]
2402.5 2401.5 2403
2403
2399.5 2009
2058.5 2091
1817
1819
Average current
[mA]
3.4
3.42
3.36
3.14
3.19
6.05
5.83
5.73
Energy [mJ]
26.95
27.10
26.64 24.89 25.25
6.03
39.97 41.09
5.94
40.98 34.95 34.39
Figure 9. Literal pool with no additional data read from the Flash memory
0+]ERWKSUHUHDG
DQGSUHIHWFKRQ
0+]SUHIHWFKRQO\
0+]SUHUHDGRQO\
0+]EXIIHUGLVDEOHG
0+]QRSUHUHDG
RUSUHIHWFK
,>P$@
0+]SUHUHDGRQO\
0+]SUHIHWFKDQGSUHUHDG
0+]EXIIHUGLVDEOHG
0+]QRSUHUHDGRUSUHIHWFK
0+]SUHIHWFKRQO\
WLPH>PV@
06Y9
18/22
DocID028482 Rev 1
AN4777
Power consumption and performance comparison using STM32L0 Series devices
Table 13. Literal pool with DMA simultaneously reading data from the Flash memory
Frequency
16 MHz
32 MHz
Latency
0
0
0
0
0
1
1
1
1
1
Prefetch
1
0
0
1
0
1
0
0
1
0
Pre-read
1
1
0
0
0
1
1
0
0
0
Disabled buffer 0
0
1
0
0
0
0
1
0
0
Time [ms]
2533.5 2533.5 4854.5 4587
4591
2292.5 2301
2420
2299
2302.5
Average
current [mA]
3.86
3.86
3.38
3.32
3.29
7.42
7.39
7.34
7.25
7.18
Energy [mJ]
32.27
32.27
54.15
50.26
49.84
56.13
56.11
58.62
55.00
54.56
Figure 10. Literal pool with DMA simultaneously reading data from the Flash memory
0+]SUHUHDGRQO\
0+]SUHIHWFKDQGSUHUHDG
0+]EXIIHUGLVDEOHG
0+]RQO\SUHIHWFK
0+]QRSUHUHDGRUSUHIHWFK
,>P$@
0+]SUHUHDGGLVDEOHG
0+]SUHUHDGDFWLYH
0+]EXIIHUGLVDEOHG
WLPH>PV@
06Y9
This example finally demonstrates the advantage of pre-read setting. It can greatly improve
the efficiency when more than one stream of data is read from the Flash memory and there
is no latency. Prefetch is not useful when dealing mostly with data, that is no surprise. Again
it is a good idea to keep the buffer enabled. The only reason to disable the buffer is if the
timing needs to be more deterministic, whatever the efficiency cost may be.
DocID028482 Rev 1
19/22
21
Conclusion
6
AN4777
Conclusion
The measured results provide the guidance for decision whether or not to enable the
different memory interface settings. The features improving the benchmark result also lead
to a higher power consumption and the overall efficiency is dependent on the task
processed by the microcontroller.
There is no significant benefit in tweaking the settings when the Flash memory latency is not
in place. This makes sense only if the Flash memory contains frequently used literal pools
(predefined data constants).
With the Flash memory latency equals to one, the Flash interface should be setup carefully,
as the performance difference between the optimal and default configuration may be
significant. It is definitely possible to activate some Flash interface settings only temporarily
for particular operations and disable them afterwards.
20/22
DocID028482 Rev 1
AN4777
7
Revision history
Revision history
Table 14. Document revision history
Date
Revision
19-Jan-2016
1
Changes
Initial release.
DocID028482 Rev 1
21/22
21
AN4777
IMPORTANT NOTICE – PLEASE READ CAREFULLY
STMicroelectronics NV and its subsidiaries (“ST”) reserve the right to make changes, corrections, enhancements, modifications, and
improvements to ST products and/or to this document at any time without notice. Purchasers should obtain the latest relevant information on
ST products before placing orders. ST products are sold pursuant to ST’s terms and conditions of sale in place at the time of order
acknowledgement.
Purchasers are solely responsible for the choice, selection, and use of ST products and ST assumes no liability for application assistance or
the design of Purchasers’ products.
No license, express or implied, to any intellectual property right is granted by ST herein.
Resale of ST products with provisions different from the information set forth herein shall void any warranty granted by ST for such product.
ST and the ST logo are trademarks of ST. All other product or service names are the property of their respective owners.
Information in this document supersedes and replaces information previously supplied in any prior versions of this document.
© 2016 STMicroelectronics – All rights reserved
22/22
DocID028482 Rev 1