AN60630 PSoC 3 8051 Code and Memory Optimization.pdf

AN60630
PSoC® 3 - 8051 Code And Memory Optimization
Author: Mark Ainsworth
Associated Project: No
Associated Part Family: All PSoC 3 parts
Software Version: N/A
Related Application Notes: None
To get the latest version of this application note, or the associated project file, please visit
http://www.cypress.com/go/AN60630.
®
AN60630 shows how to increase the efficiency of 8051 code in PSoC 3 by making greater use of the 8051 core internal
features. This can result in smaller code size in flash memory, as well as faster code. The efficiency gains can be
realized without writing any 8051 assembler code. Instead, keywords for the Keil 8051 C compiler are used. Several
coding techniques are shown.
Contents
Introduction .................................................................................................................................................................................. 2
The 8051 “Inner Space” ............................................................................................................................................................... 2
Direct and Indirect Access ....................................................................................................................................................... 3
SFR Space .............................................................................................................................................................................. 3
Keil 8051 Memory Models ............................................................................................................................................................ 4
The Guidelines ............................................................................................................................................................................. 5
Guideline #1: Use Bit Variables ............................................................................................................................................... 5
Guideline #2: Do Not Call Functions from ISRs ....................................................................................................................... 6
Guideline #3: Place Your Variables in the Correct Memory Spaces ........................................................................................ 7
Guideline #4: Decrement Loop Variables ................................................................................................................................ 7
Guideline #5: Use Bits for Bitwise Operations ......................................................................................................................... 8
Guideline #6: Use the B Register for Temporary Storage ....................................................................................................... 9
Advanced Topics ........................................................................................................................................................................ 10
Topic #1: Variable Overlay .................................................................................................................................................... 10
Topic #2: Pointers ................................................................................................................................................................. 12
Topic #3: Constants and Flash .............................................................................................................................................. 13
Topic #4: Passing Arguments to Functions ........................................................................................................................... 14
Topic #5: Passing Structures................................................................................................................................................. 17
Topic #6: Switch Statements ................................................................................................................................................. 19
Topic #7: Large Arrays and Structures .................................................................................................................................. 21
Topic #8: Compact Data Space ............................................................................................................................................. 23
Topic #9: Use All of the Resources in Your PSoC ................................................................................................................. 24
Summary .................................................................................................................................................................................... 25
Worldwide Sales and Design Support ........................................................................................................................................ 27
www.cypress.com
Document No. 001-60630 Rev. *G
1
PSoC® 3 - 8051 Code and Memory Optimization
Introduction
®
One common misconception when programming the PSoC 3 8051 is that the only way to get optimal code is to use 8051
assembly language. This is not true, mainly because of the high performance capabilities of the Keil 8051 C compiler. This
compiler is included free with PSoC Creator™, the development tool for PSoC 3, PSoC 4, and PSoC 5LP.
Because of the compiler‟s capabilities, most if not all PSoC 3 8051 code can be written in C, and it can be made to be small,
fast, and efficient. The cost is that you must use Keil-specific keywords, and C code containing these keywords may not be
easily portable to other processors, such as the Cortex CPUs in PSoC 4 and PSoC 5LP. However, PSoC Creator offers
equivalent macros that make porting easier.
In any case, by using these keywords or macros, and with knowledge of some code architecture issues, you can make your
8051 code faster and smaller, and avoid using the PSoC 3 8051 in its slowest and least efficient mode.
It is assumed that the reader has a basic knowledge of C programming. Knowledge of 8051 assembler is recommended but
not required.
Note All of the code shown in this application note was compiled using Keil optimization for size, level 3 (size, level 2 is the
PSoC Creator default). Level 3 deletes redundant MOV operations, which can have a significant impact on code size and
speed.
The 8051 “Inner Space”
In addition to normal SRAM access, some of the bytes in the
lower address space can be accessed in other modes, as
Table 1 shows.
The 8 “registers” R0–R7 are a useful set of auxiliary registers
that can be accessed quickly with single-byte, single-cycle
8051 assembler instructions such as:
Direct access
80
7F
SFRs
SRAM
The upper 128 bytes contains another 128 bytes of SRAM
that can only be accessed indirectly. The same upper
address space also contains a set of SFRs that can only be
accessed directly.
FF
Indirect access
The lower 128 bytes of this space is all SRAM, and is
accessible both directly and indirectly (more on these terms
later).
Figure 1. 8051 Internal Space Map
SRAM
The 8051 core is a 256-byte address space that contains 256
bytes of SRAM plus a large set of registers called Special
Function Registers (SFRs), as Figure 1 shows. A lot of
functionality is packed into this “internal space” and the 8051
is most efficient when it works in this space.
00
Table 1. 8051 Lower Internal Address Space Functions
Addresses
Function
20–2F
Bit-addressable space
Only one register bank can be active at a time; usually it is
register bank 0.
10–1F
Register bank 3 (R0–R7)
10–17
Register bank 2 (R0–R7)
Each of the 128 bits in the bit-addressable space 20–2F can
be accessed individually with bit-level assembler instructions
such as:
08–0F
Register bank 1 (R0–R7)
00–07
Register bank 0 (R0–R7)
ADD A,Rn
SETB nn
where nn is the bit number. If nn is 00, then bit 0 of address
20 is accessed; if nn is 01, then bit 1 of address 20 is
accessed, and so on.
www.cypress.com
Document No. 001-60630 Rev. *G
2
PSoC® 3 - 8051 Code and Memory Optimization
Direct and Indirect Access
With direct access, the address is part of the assembler instruction; for example:
INC
nn
where nn is the address of either the first 128 bytes of internal SRAM or an SFR.
For indirect access, register R0 or R1 is used as a pointer; for example:
DEC
@Ri
where i is 0 or 1. Using indirect access, the full 256 bytes of internal SRAM is accessible.
The 8-bit stack pointer register SP is also a pointer to all 256 bytes of SRAM; pushing and popping the stack are considered
indirect accesses. The stack pointer grows upward. Because the stack size is always less than 256 bytes, stack operations
must be managed carefully.
SFR Space
As noted previously, direct addresses of addresses 80–FF access the SFRs. Almost all registers in the 8051, including the
accumulator (ACC), program status word (PSW), and stack pointer (SP), are actually SFRs. Also, some PSoC 3 I/O port
registers can be accessed as SFRs. Check the PSoC 3 datasheet and Technical Reference Manual for details on these SFRs;
see also Table 2. Note that many of the SFRs are unpopulated; reading or writing to them yields unpredictable results.
Table 2. PSoC 3 8051 SFR Map
Address
0/8
1/9
2/A
0xF8
SFRPRT15DR
SFRPRT15PS
SFRPRT15SEL
0xF0
B
0xE8
SFRPRT12DR
0xE0
ACC
0xD8
SFRPRT6DR
0xD0
PSW
0xC8
0xC0
3/B
4/C
DPX0
DPX1
DPH0
DPL1
5/D
6/E
DPH1
DPS
7/F
SFRPRT12SEL
SFRPRT12PS
MXAX
SFRPRT6PS
SFRPRT6SEL
SFRPRT5DR
SFRPRT5PS
SFRPRT5SEL
SFRPRT4DR
SFRPRT4PS
SFRPRT4SEL
0xB0
SFRPRT3DR
SFRPRT3PS
SFRPRT3SEL
0xA8
IE
0xA0
P2AX
0x98
SFRPRT2DR
SFRPRT2PS
0x90
SFRPRT1DR
SFRPRT1PS
0xB8
0x88
0x80
SFRPRT0DR
SFRPRT1SEL
SFRPRT2SEL
SFRPRT0PS
SFRPRT0SEL
SP
DPL0
As noted previously, bits 00–7F access a region in lower SRAM. Bits 80–FF access some of the SFRs, in the following
manner: Bits 80–87 access the individual bits in SFR 80, SFRPRT0DR. Bits 88–8F access the individual bits in SFR 88, which
is unpopulated, and so on. So individual bits can be accessed in SFRs at addresses 80, 88, 90, 98, …, F0, F8. The most
frequently used PSoC 3 / 8051 registers are located at these SFR addresses.
www.cypress.com
Document No. 001-60630 Rev. *G
3
PSoC® 3 - 8051 Code and Memory Optimization
Keil 8051 Memory Models
Based on the 8051 architecture and instruction set, the Keil C compiler defines three memory models: small, compact, and
large. These memory models control the addressing mode in the 8051 assembly language output, and thus allow precise
control of code size and execution speed, as Code 1 shows.
Code 1. C Code with Keil Keywords and Corresponding 8051 Assembler, for Different Keil Memory Models
/* C variable definitions, in different memory spaces */
data char small_direct_var;
idata char small_indirect_var;
pdata char compact_var;
char large_var; /* large memory model default */
/* usage of the variables: simple increment operations in C */
small_direct_var++;
small_indirect_var++;
compact_var++;
large_var++;
; assembler equivalents of the above lines of C code
; small_direct_var++;
0500
INC
small_direct_var
; 2 bytes, 3 cycles
7800
06
; small_indirect_var++;
MOV
R0,#LOW small_indirect_var
INC
@R0
7800
E2
04
F2
; compact_var++;
MOV
R0,#LOW compact_var
MOVX
A,@R0
INC
A
MOVX
@R0,A
900000
E0
04
F0
; large_var++;
MOV
DPTR,#large_var
MOVX
A,@DPTR
INC
A
MOVX
@DPTR,A
; 3 bytes, 5 cycles
; 5 bytes, 8 cycles
; 6 bytes, 9 cycles
In Code 1, you can see that successively larger memory models require more flash bytes and more CPU cycles. The default
model for PSoC Creator is large (to maintain compatibility with PSoC 5LP), but that default can be overridden for individual
variables, functions, and even entire modules.
The keywords „data‟ and „idata‟ are used to designate small model variables in direct and indirect modes, respectively. The
keyword „pdata‟ is used to designate the compact model, and „xdata‟ (or default) is used for the large model. For details see
Table 5 on page 24.
The small model accesses the 8051 internal space described previously.
The compact and large models access the “external” space, which is “external” to the 8051 core but of course is “internal” to
the PSoC 3 device. All of the PSoC 3 SRAM, registers, EMIF space, and so on are in this “external” space. The size of this
space is 16 Mbytes, so three address bytes are required to access this space. For more information see Topic #2: Pointers.
You can also see that in the compact (pdata) model, the “external” space is accessed using R0 or R1. The other two bytes
come from the SFRs MXAX and P2AX, so that the three-byte address, formed from the three registers, is:
[ MXAX : P2AX : Ri ]
So before accessing pdata variables the SFRs MXAX and P2AX must be loaded with appropriate values. For more information
see Topic #8: Compact Data Space.
www.cypress.com
Document No. 001-60630 Rev. *G
4
PSoC® 3 - 8051 Code and Memory Optimization
And, finally, you can see that in the large (xdata or default) model, the “external” space is accessed using the 16-bit DPTR
register (which is composed of the SFRs DPH and DPL). The third byte comes from the SFR DPX, so that the three-byte
address, formed from the three registers, is:
[ DPX : DPTR ]
Before accessing xdata variables, the SFR DPX must be loaded with an appropriate value. Note that because the PSoC 3
8051 has two DPTR registers, there are actually six SFRs: DPX0, DPH0, DPL0, DPX1, DPH1, and DPL1. The SFR DPS controls
which DPTR is currently active.
The Guidelines
Now that you understand the PSoC 3 8051 memory layout and the Keil memory models, consider some guidelines to optimize
your C code. Using these guidelines will, in most cases, yield improvements in both code size and execution time. The
guidelines are in prioritized order of most effective first.
Guideline #1: Use Bit Variables
The simplest and best way to get dramatic improvements
in efficiency is to look for all variables that will have only
binary values (0 and not 0), and define them as type „bit‟:
bit
myBitVar;
Variables of type bit are treated similarly to standard C
variables:
Generates only two assembler instructions:
B200
200006
CPL
JB
myVar
myVar,?C0002
When you use bit variables, you can frequently implement
a nontrivial C statement with just a single assembler
instruction.
/* assign a value to the variable */
myBitVar = 1;
myBitVar = 0;
/* toggle the variable */
myBitVar = ~myBitVar;
/* do bit-level operations */
bitVar1 |= bitVar2;
/* test the variable */
if (myBitVar)
{
. . .
}
The above example uses only 5 flash bytes and 8 CPU
cycles. Compare it to the assembler code that is
generated if you change the variable type „bit‟ to type
„char‟:
Function parameters and return values can be of type bit:
The code now uses 9 flash bytes and 15 CPU cycles,
almost a 2x increase.
bit myFunction(bit x, bit y);
There are some limitations – you cannot have arrays of
type bit, and you cannot have pointers to variables of type
bit:
/* illegal statements */
bit myBitArray[10];
bit *myBitPointer;
www.cypress.com
MOV
MOVX
CPL
MOVX
DPTR,#myVar
A,@DPTR
A
@DPTR,A
E0
7002
MOVX
JNZ
A,@DPTR
?C0001
You are limited to a total of 128 bit variables in your code;
this is the number of bits in the 8051 bit-addressable
space. (You get a linker error if you overflow the bit
space.)
Finally, it is easy to port this code to PSoC 5LP by using
the CYBIT macro provided in PSoC Creator, instead of the
„bit‟ keyword:
With bit variables, the extensive set of 8051 bit-level
assembler instructions can be used to generate very fast
and compact code. For example, the following C code:
myVar = ~myVar;
if (!myVar)
{
...
}
900000
E0
F4
F0
CYBIT myBitVar;
PSoC Creator has a complete set of macros to ease
portability of PSoC 3 C code to PSoC 5LP. For details,
see the auto-generated file cytypes.h, in the cyboot
folder.
Document No. 001-60630 Rev. *G
5
PSoC® 3 - 8051 Code and Memory Optimization
Guideline #2: Do Not Call Functions from ISRs
When compiling C code for an interrupt service routine
(ISR), the Keil compiler attempts to push onto the stack
only those registers that it thinks will be changed by the
ISR code. However, if the ISR code includes a function
call, the compiler cannot tell which registers are modified
by the called function, and therefore pushes everything
onto the stack. For this reason, the C code in a very
simple ISR:
Generates a massive amount of push/pop overhead in the
corresponding assembler code:
PUSH
PUSH
PUSH
PUSH
PUSH
PUSH
MOV
PUSH
MOV
PUSH
MOV
PUSH
PUSH
PUSH
PUSH
PUSH
PUSH
PUSH
PUSH
LCALL
POP
POP
POP
POP
POP
POP
POP
POP
POP
POP
POP
POP
POP
POP
POP
POP
POP
RETI
CY_ISR(myISR)
{
/* copied from UART_1.c */
UART_1_RXSTATUS;
}
This yields some reduction in push/pop overhead, as the
following assembler code shows:
CY_ISR(myISR)
{
UART_1_ReadRxStatus();
}
C0F0
C083
C082
C085
C084
C086
758600
C000
750000
C0D0
75D000
C000
C001
C002
C003
C004
C005
C006
C007
120000
D007
D006
D005
D004
D003
D002
D001
D000
D0D0
D000
D086
D084
D085
D082
D083
D0F0
D0E0
32
Now, this particular function call just reads a register, so
we can modify the C code to read the register directly:
B
DPH
DPL
DPH1
DPL1
DPS
DPS,#00H
?C?XPAGE1SFR
?C?XPAGE1SFR,#?C?XPAGE1RST
PSW
PSW,#00H
AR0
AR1
AR2
AR3
AR4
AR5
AR6
AR7
UART_1_ReadRxStatus
AR7
AR6
AR5
AR4
AR3
AR2
AR1
AR0
PSW
?C?XPAGE1SFR
DPS
DPL1
DPH1
DPL
DPH
B
ACC
C0E0
C083
C082
C085
C084
C086
758600
C000
750000
C0D0
75D000
C007
906465
E0
FF
D007
D0D0
D000
D086
D084
D085
D082
D083
D0E0
32
PUSH
PUSH
PUSH
PUSH
PUSH
PUSH
MOV
PUSH
MOV
PUSH
MOV
PUSH
MOV
MOVX
MOV
POP
POP
POP
POP
POP
POP
POP
POP
POP
RETI
ACC
DPH
DPL
DPH1
DPL1
DPS
DPS,#00H
?C?XPAGE1SFR
?C?XPAGE1SFR,#?C?XPAGE1RST
PSW
PSW,#00H
AR7
DPTR,#06465H
A,@DPTR
R7,A
AR7
PSW
?C?XPAGE1SFR
DPS
DPL1
DPH1
DPL
DPH
ACC
This code uses 51 bytes and 65 cycles, a reduction of
36% in the number of cycles, and the code is still easily
portable to PSoC 5LP. This is good, but you can get even
more improvement by using flags.
This code uses 79 flash bytes and 101 CPU cycles just to
make one function call.
www.cypress.com
Document No. 001-60630 Rev. *G
6
PSoC® 3 - 8051 Code and Memory Optimization
A flag is a global variable that is used to signal state
changes between multiple independent functions.
It is simple to implement a flag – in the ISR, set a global
variable (of type bit), and then have the background code
read the register when the variable is set:
The ISR portion of this C code generates the following
assembler code:
D200
32
SETB
RETI
flag
CYBIT flag;
which uses 3 bytes and 7 cycles, for a 93% reduction in
number of cycles from the original ISR code.
CY_ISR(myISR)
{
flag = 1;
}
The cost of having a flag-based design is that you need to
make sure that the status register is read by the
background code in a timely fashion, which may be
difficult in some cases.
void main()
{
/* Wait for the ISR to set the
* flag, then reset it before
* taking any action.
*/
if (flag)
{
flag = 0;
UART_1_ReadRxStatus();
. . .
Making the flag type uint8 instead of bit does not yield any
similar reductions, because the variable is in the same
default xdata space as the register. However, this can be
solved by placing the variable in an 8051 internal memory
space, as explained in the next section.
Guideline #3: Place Your Variables in the Correct Memory Spaces
As shown previously, significant efficiencies can be gained when a variable is placed in one of the 8051 internal memory
spaces. Therefore, in order of frequency of access, variables should be of type „data‟, then „idata‟, „pdata‟, and lastly „xdata‟
(xdata is the PSoC Creator default). See Table 5 on page 24 for details.
Also, because of limited stack space, the Keil compiler does not save local variables on the stack as is normally done in C.
Instead, it uses fixed memory locations to store local variables and shares those locations among local variables in functions
that don‟t call each other. See Topic #1: Variable Overlay on page 10 for details.
So this guideline is actually twofold:
1.
As much as possible, make variables local within functions. Not only is it good programming practice to have as few
global variables as possible, but the Keil compiler can try to store locals in auxiliary registers R0–R7, which further
improves efficiency.
2.
Make as many local variables as possible of type „data‟ or „idata‟. (You get a linker error if you overflow the data space.)
Check your function / ISR calling depth to make sure that you don‟t run out of stack space, which is shared with the
data and idata spaces in internal SRAM.
Guideline #4: Decrement Loop Variables
Try to make your loop variables decrement instead of
increment, because it‟s faster to test for equality to zero
than for less than a constant. For example, the following C
code:
void main()
{
data uint8 i;
Generates the following small amount of assembler:
75000A
E500
6006
1500
80EF
/* loop 10 times */
for (i = 10; i != 0; i--)
{
...
}
www.cypress.com
Document No. 001-60630 Rev. *G
MOV
?C0002:
MOV
JZ
. . .
DEC
SJMP
?C0003:
i,#0AH
; i = 10
A,i
?C0003
; i != 0
i
?C0002
; i--
7
PSoC® 3 - 8051 Code and Memory Optimization
If you write the C code such that the loop variable
increments instead of decrements:
void main()
{
data uint8 i;
You get a larger amount of assembler:
E4
F500
E500
C3
940A
5006
/* loop 10 times */
for (i = 0; i < 10; i++)
{
...
}
0500
80EF
CLR
MOV
?C0002:
MOV
CLR
SUBB
JNC
. . .
INC
SJMP
?C0003
A
i,A
; i = 0
A,i
C
A, #0AH
?C0003
; i < 10
i
?C0002
; i++
Guideline #5: Use Bits for Bitwise Operations
As Guideline #1 shows, defining bit variables can greatly
increase code efficiency by generating bit-level assembler
instructions. Bit-level assembler instructions can also be
used to implement C bitwise operations. Consider a
variable with a bit that you want to set or test. In C, you
would write the following:
uint8 x;
x |= 0x10;
x &= ~0x10;
x ^= 0x10;
if (x & 0x10)
{
. . .
}
/*
/*
/*
/*
set bit 4 */
clear bit 4 */
toggle bit 4 */
test bit 4 */
To implement C bitwise operations using 8051 bit-level
assembler instructions, you must use the „sbit‟ keyword
and the special operator „^‟ (which in this case does NOT
do a C exclusive-or operation).
There are two ways to do this. The first is to place the
variable in the internal bit-addressable space 20–2F, using
the „bdata‟ keyword. Then, define the bit of interest using
sbit and ^:
/* This places myVar in the 8051
* internal data space, in 20–2F.
*/
bdata uint8 myVar;
/* this is bit 4 of myVar */
sbit mybit4 = myVar^4;
/* set bit 4 of myVar */
mybit4 = 1;
/* clear bit 4 of myVar */
mybit4 = 0;
/* toggle bit 4 of myVar */
mybit4 = ~mybit4;
/* test bit 4 of myVar */
if (mybit4)
{
. . .
}
www.cypress.com
This technique yields all of the efficiencies noted in
Guideline #1. It even works for variables larger than 8 bits;
for example, uint16, int32, and so on. Note that the bdata
and sbit definitions must be done on global or static
variables, not locally within a function.
The second method, which is useful only for testing bits, is
to temporarily place the value of interest in one of the bitaddressable SFRs. As an example, the bits in the SFR
PSW are defined in the PSoC Creator generated source file
PSoC3_8051.h, in the cy_boot folder:
sfr PSW
= 0xD0;
sbit
sbit
sbit
sbit
sbit
sbit
sbit
sbit
=
=
=
=
=
=
=
=
P
F1
OV
RS0
RS1
F0
AC
CY
PSW^0;
PSW^1;
PSW^2;
PSW^3;
PSW^4;
PSW^5;
PSW^6;
PSW^7;
Because the Program Status Word (PSW) is in SFR D0, its
bits are directly accessible. Each of the bits in the PSW are
defined using the sbit keyword. For this reason, each bit
can be accessed in the same manner as Guideline #1
shows. For example:
F0 = ~F0;
Note PSW bits F0 and F1 are flag bits that are conveniently
available for general-purpose use.
Two SFRs are usually available for temporary use in
bitwise operations – the accumulator (ACC) and an
auxiliary register called B. However, only the SFRs
themselves are defined in PSoC3_8051.h; you must
define the bits within those SFRs yourself:
/* bit 4 of ACC SFR */
sbit A4 = ACC^4;
/* bit 3 of B SFR */
sbit B3 = B^3;
Document No. 001-60630 Rev. *G
8
PSoC® 3 - 8051 Code and Memory Optimization
Then, for faster testing of a bit you can do the following:
/* assume return value is 8 bits */
ACC = UART_1_ReadRxStatus();
if (A4) /* test bit 4 */
{
. . .
}
You can also test multiple bits quickly. Try to rewrite the
following using traditional C bitwise instructions:
/* assume return value is 8 bits */
ACC = UART_1_ReadRxStatus();
/* test if bit 4 == 1 AND
*
bit 3 == 0
*/
if (A4 && !A3)
{
. . .
}
All of the port SFR definitions are available in
PSoC3_8051.h, but, again, you must create your own sbit
definitions for individual pins. You must also set the SFR
SFRPRTxSEL to control whether the pin is changed by
CPU / DMA register access as is normally done, or by
SFR access.
The following example shows a very fast toggle of GPIO
pin P1.6 using SFR bit-level access:
/* port 1 pin 6 DR */
sbit P1_6 = SFRPRT1DR^6;
void main()
{
/* P1.6 to be changed by SFR
* access
*/
SFRPRT1SEL = 0x40;
for(;;) /* do forever */
{
/* toggle P1.6 by SFR/sbit
* access
*/
P1_6 = ~P1_6;
}
This second method is useful only for testing a bit (or bits),
however it has the advantage that you don‟t have to use
any SRAM in the bit-addressable space.
IO Port Control SFRs
Also, as Table 2 on page 3 shows, certain registers for
every PSoC 3 IO port are available as SFRs. You can
read the input port pins‟ states by reading the
corresponding SFRPRTxPS SFR, and you can then test
individual pin states using the bit-level techniques
described above.
You can also control the output port pins by writing to the
corresponding
SFRPRTxDR
SFR.
Because
the
SFRPRTxDR registers are located in bit-addressable SFRs,
pin outputs can be quickly changed using bit-level
assembler instructions.
}
The for loop is implemented using only two assembler
instructions:
B296
80D3
?C0001
CPL
SJMP
P1_6
?C0001
Guideline #6: Use the B Register for Temporary Storage
In the 8051 architecture, the B register (in SFR address F0) is used by the assembler instructions MUL and DIV. At all other
times, it‟s just an auxiliary register and is usually not used. But as an auxiliary register, it can be handy, for example, when
swapping two 8-bit variables:
uint8 x, y;
B = x;
x = y;
y = B;
The B register can also be used for rapid bit-level testing, as noted previously.
www.cypress.com
Document No. 001-60630 Rev. *G
9
PSoC® 3 - 8051 Code and Memory Optimization
Advanced Topics
The guidelines shown previously introduced some of the Keil C keywords and showed some simple C coding techniques that,
using the keywords, yield increased efficiency.
The following topics build on the guidelines, but are more on an architecture level. They show how to design C code for the
8051 to get further reductions in code size and CPU cycles.
Topic #1: Variable Overlay
As seen previously in Code 1 and Guideline #3, you get the greatest amount of code efficiency by using the 8051 internal data
spaces (data, idata, bit, bdata, SFRs).
Also, because of limited stack space, the Keil compiler does not save local variables on the stack as is normally done in C.
Instead, it uses fixed memory locations to store local variables and shares those locations among local variables in functions
that don‟t call each other. Keil calls this “data overlaying”.
The following example has two functions, olTest1() and olTest2(), that are called only from main(). Each function, plus main(),
manipulates two automatic 32-bit variables.
void olTest1()
{
uint32 x = 1;
uint32 y = x + 2;
x = y - 1;
}
void olTest2()
{
uint32 a = 3;
uint32 b = a + 5;
a = b - 1;
}
void main()
{
uint32 m = 10;
uint32 n = m + 20;
m = n - 7;
olTest1();
olTest2();
An 8-bit processor requires a lot of code to handle 32-bit variables. To increase the efficiency of that code, you could move the
automatic variables from the (default) xdata space to the data space, but that would use up a lot of valuable bytes in the data
space. To solve this problem, Keil automatically shares storage for the variables x, y, a, and b in the two test functions.
(Variables m and n in main() must have dedicated storage.)
In PSoC Creator, Workspace Explorer window, click the Results tab and find the .map file for your project. (PSoC Creator
projects have map file creation enabled by default. If you don‟t see a .map file, check your project build settings, under Linker.)
In the .map file, find a line like this:
START
STOP
LENGTH
ALIGN RELOC
MEMORY CLASS
SEGMENT NAME
=========================================================================
. . .
000051H
000060H
000010H
BYTE
UNIT
XDATA
_XDATA_GROUP_
The segment _XDATA_GROUP_ includes the space that is shared by all overlaid variables. The segment occupies 16 bytes: 8
for the overlaid variables in the test functions and 8 for the non-overlaid variables in main(). Build a PSoC Creator project with
the code shown above, run the debugger, and bring up a memory window to monitor this segment. Step into both test
functions and see that their automatic variables share the same memory.
www.cypress.com
Document No. 001-60630 Rev. *G
10
PSoC® 3 - 8051 Code and Memory Optimization
Now exit the debugger and look in the .map file for a line like this:
START
STOP
LENGTH
ALIGN RELOC
MEMORY CLASS
SEGMENT NAME
=========================================================================
. . .
000008H
00000FH
000008H
BYTE
UNIT
DATA
_DATA_GROUP_
This line shows that 8 bytes in the data space are already being used by some other functions. You should be able to reuse
the same space for the overlaid automatic variables. Do this by adding the „small‟ (or the PSoC Creator macro CYSMALL)
keyword to the two test functions:
void olTest1() small
void olTest2() small
When applied to a function, the „small‟ keyword causes that function‟s arguments and automatic variables to be placed in the
data space. You can instead add the „data‟ (or CYDATA) keyword to the automatic variable declarations, but this does not
affect storage of function arguments.
Now rebuild and check the .map file – there is no increase in the use of data space bytes. The debugger also shows that these
bytes are being shared by the test functions. You have gained the efficiencies of the 8051 internal data space without using
any additional bytes within that space.
Fewer than 128 data space bytes are available; if you run out, you can add the „idata‟ keyword to automatic variables. This
allows you to use some of the other 128 bytes in the 8051 internal space (leave room for stack growth), and creates an overlay
segment _IDATA_GROUP_ for the idata space. Similarly, automatic variables of type „bit‟ can be placed in an overlay segment
_BIT_GROUP_.
The significance of this topic is that your code should be constructed such that the further up in the calling tree a function is,
the fewer local variables and arguments it should have. Ideally, main() should have none. Your code‟s calling tree depth
should be as small as possible; this also reduces stack usage. Functions at the bottom of the calling tree can be declared
„small‟ to maximize efficient use of their local variables. Finally, there should be as few global and static variables as possible,
as these cannot be overlaid.
This brings up another issue, C library functions. For example, try adding to one of the test functions a call to memset() to clear
one of the variables:
memset((void *)&a, (char)0, sizeof(a));
Examine the .map file before and after adding the call. You should see no difference in the amount of data or xdata memory
being used (the code size increases, of course). Keil does not supply the source for most library functions. However, because
no additional SRAM is used, and from a review of the assembler code (in the PSoC Creator debugger‟s Disassembler
window), you can infer that the function is using registers, variable overlay, or both. This is generally true of Keil library
functions, although some may behave differently.
Another point from this topic is the need to check the .map file to understand how your code is being implemented in the 8051
architecture. The .map file provides a wealth of information on usage of the different memory spaces, in addition to many other
subjects. For more information, see the Keil LX51 Linker User‟s Guide.
www.cypress.com
Document No. 001-60630 Rev. *G
11
PSoC® 3 - 8051 Code and Memory Optimization
Topic #2: Pointers
Most CPUs have a single linear address space, and so the size of C pointer variables for these CPUs is determined by the
size of the address space. For example, a CPU with a 64 K address space has 2-byte pointer variables, while a 32-bit CPU
(such as the ARM Cortex in PSoC 4 and PSoC 5LP) has 4-byte pointer variables.
The 8051 CPU is different in that it has multiple address spaces, ranging from 128 bytes to 64 K bytes in size. To handle this,
the Keil C compiler defines two types of pointers – generic and memory-specific.
A generic pointer accesses data regardless of the memory in which it is stored. It uses 3 bytes – the first is the memory type,
the second is the high-order byte of the address, and the third is the low-order byte of the address. A memory-specific pointer
uses only one or two bytes depending on the specified memory type. The following example demonstrates each type:
char idata *ip = &ival;
char xdata *xp = &xval;
char *p = &xval;
/* memory-specific pointer to idata space */
/* memory-specific pointer to xdata space */
/* generic pointer (to xdata space) */
char val = *ip;
val = *xp;
val = *p;
/* read a value from the idata space */
/* read a value from the xdata space */
/* read a value using the generic pointer */
; Corresponding assembler
750000
750000
750000
750001
750000
750000
A800
E6
F500
850082
850083
E0
F500
AB00
AA00
A900
120000
F500
MOV
MOV
MOV
MOV
MOV
MOV
MOV
MOV
MOV
MOV
MOV
MOVX
MOV
MOV
MOV
MOV
LCALL
MOV
ip,#LOW ival
xp,#HIGH xval
xp+01H,#LOW xval
p,#01H
p+01H,#HIGH xval
p+02H,#LOW xval
R0,ip
A,@R0
val,A
DPL,xp+01H
DPH,xp
A,@DPTR
val,A
R3,p
R2,p+01H
R1,p+02H
?C?CLDPTR
val,A
; 1-byte ptr to idata space
; 2-byte ptr to xdata space
; 3-byte generic ptr (01H = xdata)
; read from the idata space
; 5 bytes, 7 cycles
; read from the xdata space
; 9 bytes, 11 cycles
; read using the generic ptr
; 11 bytes, 19+ cycles
; Keil library function
The main point of this topic is that memory-specific pointers are more efficient. Generic pointers should be used only when the
memory type is unknown. Note that most Keil library functions take generic pointers as arguments; memory-specific pointers
are automatically cast to generic pointers.
Function Pointers
As Topic #4 describes, the Keil compiler does not pass C function arguments on the stack. Instead, it uses either registers or
fixed memory locations. This can cause problems with function pointers, because the linker cannot predict where the pointedto function resides in the calling tree and therefore may not put the parameters in a safe location in memory.
To address this issue, Keil provides an OVERLAY linker directive. Keil also provides a detailed application note on this topic,
see http://www.keil.com/appnotes/files/apnt_129.pdf for details.
www.cypress.com
Document No. 001-60630 Rev. *G
12
PSoC® 3 - 8051 Code and Memory Optimization
Topic #3: Constants and Flash
C has a qualifier keyword „const‟ that can be added to a variable or array declaration. The keyword tells the compiler that the
variable may not be changed; code that tries to change the variable gives a compile error. However, the „const‟ qualifier says
nothing about where the variable is stored, as this example shows:
const char testvar = 37;
void main()
{
char testvar2 = testvar;
The corresponding 8051 assembler shows that the const variable „testvar‟ is stored in SRAM.
900000
E0
900000
F0
MOV
MOVX
MOV
MOVX
DPTR,#testvar
A,@DPTR
DPTR,#testvar2
@DPTR,A
; MOVX accesses xdata space
And, to initialize the SRAM the value is also stored in flash, and copied to the correct SRAM location in the C startup code.
PSoC 3 has 8 times more flash than SRAM. If SRAM is being used up, it may make sense to keep some const variables,
strings, or arrays in flash. In the Keil C compiler, to force storage of a variable or array in flash, you must use the keyword
„code‟ (or CYCODE) in the declaration:
code const char testvar = 37;
void main()
{
char testvar2 = testvar;
The corresponding 8051 assembler shows that the const variable „testvar‟ is now stored in flash:
900000
E4
93
900000
F0
MOV
CLR
MOVC
MOV
MOVX
DPTR,#testvar
A
A,@A+DPTR
DPTR,#testvar2
@DPTR,A
; MOVC accesses code space
Because of the nature of the MOVC instruction, this actually costs at least one extra byte to set up the index in the
accumulator. For this reason, use this method only when truly necessary.
Be careful about the syntax. The „const‟ is not necessary but may be needed for portability, and one declaration results in a
compile error:
code const char testvar = 37;
code char testvar = 37;
const char code testvar = 37;
char code testvar = 37;
const char testvar = 37;
const code char testvar = 37;
/*
/*
/*
/*
/*
/*
stores in flash
stores in flash
stored in flash
stores in flash
stores in SRAM
compile error
*/
*/
*/
*/
*/
*/
The syntax for keeping arrays and strings in flash is similar:
const float code array[512] = { . . . };
code const char hello[] = "Hello World";
Finally, by forcing location of a variable or array in flash, you can use memory-specific pointers, which increases code
efficiency (see Topic #2: Pointers). Of course, the same is true if you force a variable into any other memory space.
www.cypress.com
Document No. 001-60630 Rev. *G
13
PSoC® 3 - 8051 Code and Memory Optimization
Also, with regard to the differing syntax in the previous examples, the following is from Keil‟s documentation on declaring
memory space:
/* older method, may not be supported in future versions of the compiler
[Memory space] [Qualifier and Data type] variable_name */
code const int testvar; // example
/* preferred method
[Qualifier and Data type] [Memory space] variable_name */
int idata testvar; // example
/* preferred method for declaring pointer variables
[Qualifier and Data type] [Data type memory space]
* [Variable memory space] var_name */
/* pointer stored in xdata points to an integer stored in data */
int data * xdata p;
Topic #4: Passing Arguments to Functions
C function arguments are usually passed on a CPU‟s hardware stack. But because the 8051 hardware stack size is limited to
less than 256 bytes, the Keil compiler does not pass arguments on the stack. Instead, it uses either registers R1–R7 or fixed
memory locations. The preferred method is to use registers because it‟s faster and uses fewer code bytes. However, this
method has some limitations, as Table 3 shows.
Table 3. Keil Scheme for Passing Function Arguments in Registers
Argument
Number
char,
1-byte ptr
int,
2-byte ptr
long, float
generic ptr
1
R7
R7, R6 (MSB)
R7–R4 (MSB)
R3 (mem type), R2 (MSB), R1
2
R5
R5, R4 (MSB)
R7–R4 (MSB)
R3 (mem type), R2 (MSB), R1
3
R3
R3, R2 (MSB)
-
R3 (mem type), R2 (MSB), R1
If an argument does not fit into the scheme in Table 3 then it is passed in a fixed memory location. So as much as possible,
functions should be limited to three arguments, but even then the compiler may not pass all three arguments in registers. For
example, the following C code might be written to search an array:
/* search function, with three arguments */
int search(char *addr, int nbytes, char c);
char array[300];
void main()
{
search(array, sizeof(array), 'X');
And the following is the corresponding 8051 assembler:
7B01
7A00
7900
900000
7458
F0
7D00
7C02
120000
www.cypress.com
MOV
MOV
MOV
MOV
MOV
MOVX
MOV
MOV
LCALL
R3,#01H
R2,#HIGH array
R1,#LOW array
DPTR,#?_search?BYTE+05H
A,#058H
@DPTR,A
R5,#02CH
R4,#01H
_search
; first argument in regs,
; generic pointer
; third argument,
; in memory
; second argument in regs
; 19 bytes, 23 cycles
Document No. 001-60630 Rev. *G
14
PSoC® 3 - 8051 Code and Memory Optimization
According to Table 3, the third argument must be passed in a fixed memory location even though it‟s just a char and R6 and
R7 are not being used. The code can be more efficient if all arguments are passed in registers. Two methods are available for
achieving that.
First, note that in Table 3, the third argument and generic pointer arguments both have only one placement option, that is, R1–
R3. (See Topic #2: Pointers for a discussion of generic pointers.) If you simply change the order of the arguments, to make the
generic pointer the third argument, then all of the arguments can be passed in registers:
/* search function, with three arguments */
int search(int nbytes, char c, char *addr );
7B01
7A00
7900
7D58
7F00
7E02
120000
MOV
MOV
MOV
MOV
MOV
MOV
LCALL
R3,#01H
R2,#HIGH array
R1,#LOW array
R5,#058H
R7,#02CH
R6,#01H
_search
; third argument
; second argument
; first argument
; 15 bytes, 16 cycles
Another solution is to use a memory-specific pointer (see Topic #2: Pointers), which requires only two bytes instead of three:
/* search function to be called,
with three arguments */
int search( char xdata *addr , int nbytes, char c);
7E00
7F00
7B58
7D00
7C02
120000
MOV
MOV
MOV
MOV
MOV
LCALL
R6,#HIGH array
R7,#LOW array
R3,#058H
R5,#02CH
R4,#01H
_search
; first argument
; third argument
; second argument
; 13 bytes, 14 cycles
In this case, you must have the array in external SRAM. If you move it to flash or internal SRAM then the function must be
changed.
In this example, by using one of the two techniques, you can save as much as 31% bytes and 40% cycles on a function call,
depending on the function‟s arguments.
Note that arguments of type „bit‟ cannot be passed in a register; they are always passed in a fixed memory location in the bit
space in the 8051 internal memory. This is generally acceptable because very little code is needed to access a bit variable.
However bit variables should be declared at the end of a function‟s argument list, to keep the other arguments within the
Table 3 scheme.
A similar concept applies to function return values, as Table 4 shows.
Table 4. Keil Scheme for Passing Function Return Values in Registers
Return Type
www.cypress.com
Register
bit
Carry flag
char, 1-byte pointer
R7
int, 2-byte pointer
R7, R6 (MSB)
long, float
R7–R4 (MSB)
generic pointer
R3 (mem type), R2 (MSB), R1
Document No. 001-60630 Rev. *G
15
PSoC® 3 - 8051 Code and Memory Optimization
Function return values, including those of type „bit‟, are always passed in registers. This can, in turn, affect the order of
function arguments. For example, suppose you want to use the search function‟s return value as an argument for another
search. The following code finds in an array the last instance of a character before the first instance of another character:
int search(char xdata *addr, int nbytes, char c);
int searchb(char xdata *addr, int nbytes, char c);
/* search forward */
/* search backward */
char array[300];
void main()
{
searchb(array, search(array, sizeof(array), 'X'), 'A');
Here‟s the corresponding assembler:
7E00
7F00
7B58
7D00
7C02
120000
AC06
AD07
7E00
7F00
7B41
120000
MOV
MOV
MOV
MOV
MOV
LCALL
MOV
MOV
MOV
MOV
MOV
LCALL
R6,#HIGH array
R7,#LOW array
R3,#058H
R5,#00H
R4,#02H
_search
R4,AR6
R5,AR7
R6,#HIGH array
R7,#LOW array
R3,#041H
_searchb
; move return value to argument 2
It costs an extra 4 bytes and 6 cycles to move the return value, which you can avoid if you reorder the arguments:
int search(int nbytes, char xdata *addr, char c);
int searchb(int nbytes, char xdata *addr, char c);
/* search forward */
/* search backward */
char array[300];
void main()
{
searchb(search(sizeof(array), array, 'X'), array, 'A');
7E00
7F00
7B58
7D00
7C02
120000
7E00
7F00
7B41
120000
MOV
MOV
MOV
MOV
MOV
LCALL
MOV
MOV
MOV
LCALL
R4,#HIGH array
R5,#LOW array
R3,#058H
R7,#00H
R6,#02H
_search
R4,#HIGH array
R5,#LOW array
R3,#041H
_searchb
; return value is already in R6, R7
The main lesson from this example is that if a function argument may be the return value of another function, put that
argument first in the argument list whenever possible.
When you write a C function, you usually don‟t need to care about the order of the function‟s arguments. With Keil 8051 C, if
you pay attention to the argument order, you may gain significant reductions in code size.
www.cypress.com
Document No. 001-60630 Rev. *G
16
PSoC® 3 - 8051 Code and Memory Optimization
Topic #5: Passing Structures
In C, it is possible to pass a structure to a function. You can either pass the structure directly (if it is small) or pass a pointer to
a structure. The following is a simple example of passing a structure directly:
/* structure definition and instance */
struct myStruct
{
char x, y;
} testvar = {1, 2};
/* function with structure passed in directly,
returns the sum of the structure’s members
*/
char doAdd(struct myStruct str)
{
return str.x + str.y;
}
void main()
{ /* test call for the above function */
char testvar2 = doAdd(testvar); /* structure passed directly */
Note that the resulting assembler is large:
; doAdd:
900000
E0
FF
900000
E0
2F
FF
22
MOV
MOVX
MOV
MOV
MOVX
ADD
MOV
RET
DPTR,#str+01H
A,@DPTR
R7,A
DPTR,#str
A,@DPTR
A,R7
R7,A
; get the structure members from
; the fixed memory location,
; main:
7B01
7A00
7900
7800
7C00
7D01
7E00
7F02
120000
120000
900000
EF
F0
MOV
MOV
MOV
MOV
MOV
MOV
MOV
MOV
LCALL
LCALL
MOV
MOV
MOVX
R3,#01H
R2,#HIGH testvar
R1,#LOW testvar
R0,#LOW ?doAdd?BYTE
R4,#HIGH ?doAdd?BYTE
R5,#01H
R6,#00H
R7,#02H
?C?COPYAMD
doAdd
DPTR,#testvar2
A,R7
@DPTR,A
; pass structure members in memory
www.cypress.com
; and add them
; Keil library function
Document No. 001-60630 Rev. *G
17
PSoC® 3 - 8051 Code and Memory Optimization
The alternative, passing by reference (passing a pointer to the structure), results in less code in main() but more code in
doAdd(), which makes the two methods approximately equal in this case:
/* function with structure passed in by reference,
returns the sum of the structure’s members
*/
char doAdd(struct myStruct *str ) {
return str->x + str->y ;
}
void main()
{ /* test call for the above function */
char testvar2 = doAdd( &testvar ); /* structure passed by reference */
; doAdd:
900000
120000
900000
120000
E9
2401
F9
E4
3A
FA
120000
FF
900000
120000
120000
2F
FF
22
MOV
LCALL
MOV
LCALL
MOV
ADD
MOV
CLR
ADDC
MOV
LCALL
MOV
MOV
LCALL
LCALL
ADD
MOV
RET
DPTR,#str
?C?PSTXDATA
DPTR,#str
?C?PLDXDATA
A,R1
A,#01H
R1,A
A
A,R2
R2,A
?C?CLDPTR
R7,A
DPTR,#str
?C?PLDXDATA
?C?CLDPTR
A,R7
R7,A
; store the pointer in memory
; main:
7B01
7A00
7900
120000
900000
EF
F0
MOV
MOV
MOV
LCALL
MOV
MOV
MOVX
R3,#01H
R2,#HIGH testvar
R1,#LOW testvar
doAdd
DPTR,#testvar2
A,R7
@DPTR,A
; pass structure pointer in registers
; get the structure members from memory
; and add them
The lesson from this topic is simple: try to avoid passing structures to functions, regardless of method. Consider passing
structure members instead, or just make the structures static or even global. Also, be careful when using the C operator „->‟; it
is costly to implement.
www.cypress.com
Document No. 001-60630 Rev. *G
18
PSoC® 3 - 8051 Code and Memory Optimization
Topic #6: Switch Statements
When making a multipath decision based on the state of a
variable (for example, when implementing a state
machine) you can use either a series of if-else if-else
statements or a switch-case construct. To test which
method is less costly, first examine the switch option:
/* basic state machine
note that C coding best practices
require having:
- a break statement at the end of
each case, and
- a default case
*/
char state = 0;
The resulting assembler uses a sequentially scanned jump
table with a library function:
900000
E0
120000
0000
00
0000
01
0000
0000
MOV
MOVX
LCALL
DW
DB
DW
DB
DW
DW
DPTR,#state
A,@DPTR
?C?CCASE
?C0002
00H
?C0003
01H
00H
?C0004
Now, examine the if-else if-else construct:
switch (state)
{
case 0:
state++;
break;
case 1:
state--;
break;
default:
state = 0;
break;
}
/* basic state machine */
if (state == 0)
{
state++;
}
else if (state == 1)
{
state--;
}
else
{
state = 0;
}
The resulting assembler has a series of compares and jumps:
900000
E0
7005
E0
04
F0
8010
?C0001
900000
E0
B40104
14
F0
8005
?C0003:
E4
900000
F0
?C0005:
MOV
MOVX
JNZ
MOVX
INC
MOVX
SJMP
DPTR,#state
A,@DPTR
?C0001
A,@DPTR
A
@DPTR,A
?C0005
; if state == 0
MOV
MOVX
CJNE
DEC
MOVX
SJMP
DPTR,#state
A,@DPTR
A,#01H,?C0003
A
@DPTR,A
?C0005
; else if state == 1
CLR
MOV
MOVX
A
DPTR,#state
@DPTR,A
; else state = 0
; state++
; state--
This code is smaller for a state machine of this size, but for larger state machines it grows at a faster rate than the jump table
in the previous code. The general rule for code of any complexity is to use the switch statement.
www.cypress.com
Document No. 001-60630 Rev. *G
19
PSoC® 3 - 8051 Code and Memory Optimization
This is also a perfect example for using a more efficient memory space (see Guideline #3). Moving the variable „state‟ to the
data space, results in large reductions in code size:
data char state;
; switch code
E500
MOV
120000
LCALL
0000
DW
00
DB
0000
DW
01
DB
0000
DW
0000
DW
?C0002:
0500
INC
8007
SJMP
?C0003:
1500
DEC
8003
SJMP
?C0004:
E4
CLR
F500
MOV
?C0005:
; if – else
E500
7004
0500
80DE
?C0001:
E500
B40104
1500
80D5
?C0003:
E4
F500
?C0005:
A,state
?C?CCASE
?C0002
00H
?C0003
01H
00H
?C0004
; switch state
state
?C0005
; case 0: state++
state
?C0005
; case 1: state--
A
state,A
; default: state = 0
; jump table
if – else code
MOV
A,state
JNZ
?C0001
INC
state
SJMP
?C0005
; if state == 0
; state++
MOV
CJNE
DEC
SJMP
A,state
A,#01H,?C0003
state
?C0005
; else if state == 1
CLR
MOV
A
state,A
; else state = 0
; state--
Further optimizations are available; for example, Keil compiler optimization level 4 optimizes switch / case statements. The
previous example has a sequentially scanned table to decide where to jump to, which means that it may take longer to reach
the case statement for some values than for others. Keil compiler optimization for speed as opposed to size may change that
to a true jump table, where the time to reach a case statement is the same regardless of switch value.
www.cypress.com
Document No. 001-60630 Rev. *G
20
PSoC® 3 - 8051 Code and Memory Optimization
Topic #7: Large Arrays and Structures
Large arrays and structures are handled efficiently by the Keil compiler. If you need to access a structure member or an array
element directly, the corresponding address is simply accessed, as the following example shows:
/* complex structure with multiple members */
struct myStruct
{
char m1;
int
m2;
long m3;
float m4;
long m5[256]; /* including an array member */
} testvar;
/* access one element of the array member */
testvar.m5[3] = 20;
E4
7F14
FE
FD
FC
900000
120000
CLR
MOV
MOV
MOV
MOV
MOV
LCALL
A
R7,#014H
R6,A
R5,A
R4,A
DPTR,#testvar+017H
?C?LSTXDATA
; create and store a 32-bit value
; library function
The code gets more complicated, however, when array indices are calculated:
int i = 3;
testvar.m5[i] = 20;
900000
E4
F0
A3
7403
F0
E4
7F14
FE
FD
FC
C004
C005
C006
C007
900000
E0
FE
A3
E0
7802
?C0004
C3
33
CE
33
CE
D8F9
2400
F582
7400
3E
www.cypress.com
MOV
CLR
MOVX
INC
MOV
MOVX
CLR
MOV
MOV
MOV
MOV
PUSH
PUSH
PUSH
PUSH
MOV
MOVX
MOV
INC
MOVX
MOV
DPTR,#i
A
@DPTR,A
DPTR
A,#03H
@DPTR,A
A
R7,#014H
R6,A
R5,A
R4,A
AR4
AR5
AR6
AR7
DPTR,#i
A,@DPTR
R6,A
DPTR
A,@DPTR
R0,#02H
; set i = 3
CLR
RLC
XCH
RLC
XCH
DJNZ
ADD
MOV
MOV
ADDC
C
A
A,R6
A
A,R6
R0,?C0004
A,#LOW testvar+0BH ; load value to address + offset
DPL,A
A,#HIGH testvar+0BH
A,R6
; create and save a 32-bit value
; calculate offset based on i,
; offset should be i * 4
Document No. 001-60630 Rev. *G
21
PSoC® 3 - 8051 Code and Memory Optimization
F583
D007
D006
D005
D004
120000
MOV
POP
POP
POP
POP
LCALL
DPH,A
AR7
AR6
AR5
AR4
?C?LSTXDATA
; library function
A lot of code is required to calculate the offset. You can reduce the amount of code by making three changes, to only the index
variable:

Size index variables appropriately. If the number of elements in the array is 256 or less, you need only a 1-byte index.
Don‟t use a 2-byte index variable unless absolutely necessary.

Make sure that index variables are unsigned. The previous example highlights a common problem in C for the 8051,
which is the use of the „int‟ type:
int i;
for (i = 0; i < 100; i++)
{ /* do something with testvar[i] */
Although using „int‟ is common practice, it causes variables to be 16-bit with the Keil compiler, which reduces code
efficiency. A better method is to use one of the macros supplied by PSoC Creator, and explicitly define the size of the
variable and whether it is signed: int8, uint8, int16, uint16, int32, uint32.

To make offset calculations more efficient, keep index variables in the data space. Index variables are usually automatic
and can be overlaid (see Topic #1: Variable Overlay).
uint8 data i = 3;
testvar.m5[i] = 20;
975003
E4
7F14
FE
FD
FC
75F004
E500
900000
120000
120000
MOV
CLR
MOV
MOV
MOV
MOV
MOV
MOV
MOV
LCALL
LCALL
i,#03H
A
R7,#014H
R6,A
R5,A
R4,A
B,#04H
A,i
DPTR,#testvar+0BH
?C?OFFXADD
?C?LSTXDATA
; set i = 3
; load 32-bit value into registers
; calculate offset and store the value
; using library functions
Proper declaration and placement of index variables can greatly reduce the amount of code needed to process large
structures and arrays.
www.cypress.com
Document No. 001-60630 Rev. *G
22
PSoC® 3 - 8051 Code and Memory Optimization
Topic #8: Compact Data Space
The previous guidelines and topics have shown that the best place to store variables is in the 8051 internal data space or idata
space. But if you run out of room in these spaces (even with data overlaying), you don‟t have to settle for the external (xdata)
space / large memory model. There is a more efficient way to use the xdata space: the „pdata‟ space or „compact‟ memory
model. To understand how it works, consider some 8051 assembler, specifically the two forms of the MOVX instruction. The
compact form uses R0 or R1 as a pointer into the external data space, and the large form uses DPTR:
; compact form
E2
MOVX
F2
MOVX
E3
MOVX
F3
MOVX
A,@R0
@R0,A
A,@R1
@R1,A
;
;
;
;
; large form
E0
MOVX
F0
MOVX
A,@DPTR
@DPTR,A
; 2 cycles
; 3 cycles
3
4
3
4
cycles
cycles
cycles
cycles
Although the compact form uses one more cycle than the large form, when you include the bytes to load the pointer register,
the number of cycles is the same and one less byte is used:
; compact form
A800
MOV
E2
MOVX
A800
MOV
F2
MOVX
R0,#testvar
A,@R0
R0,#testvar2
@R0,A
; 3 bytes, 5 cycles
; large form
900000
MOV
E0
MOVX
900000
MOV
F0
MOVX
DPTR,#testvar
A,@DPTR
DPTR,#testvar2
@DPTR,A
; 4 bytes, 5 cycles
; 3 bytes, 6 cycles
; 4 bytes, 6 cycles
Note that DPTR is a 16-bit “register” (formed from the DPL and DPH registers) and so the large form can address 64 K bytes in
the xdata space. R0 and R1 are 8-bit registers and can access only 256 bytes, so how is 64 K in the xdata space accessed
using the compact form?
(PSoC 3 actually has a 16 Mbyte xdata space; 3 bytes are used to address this space. The MS bytes, stored in SFRs DPX for
large model and MXAX for compact model, have default reset values of zero, so the first 64 K is always available as a default.
All PSoC 3 SRAM and most registers are addressed within the first 64 K of the xdata space. See the xdata memory map and
discussion in the device datasheet for details.)
For the compact form, the most significant byte is stored in the P2AX register, SFR #A0H. In the compact memory model, the
external space is split into 256-byte pages, where P2AX is the page register and R0 or R1 is the index into the page. Although
you can, in theory, access the entire 64K of xdata space using the compact form, usually just the first 256 bytes are used for
accessing data in a more efficient mode.
With the Keil C compiler, you can define a global, static or automatic variable, structure or array to be in the compact space by
using the Keil keyword „pdata‟:
char pdata testvar[5];
void main()
{
char pdata testvar2 = testvar[3];
testvar[1] = 44;
And, similar to Topic #1: Variable Overlay, an overlay space exists for the compact data space, called _PDATA_GROUP_. Test
the example in Topic #1 with the „small‟ keywords changed to „compact‟ (or CYCOMPACT), and observe the shared usage of
the pdata space.
www.cypress.com
Document No. 001-60630 Rev. *G
23
PSoC® 3 - 8051 Code and Memory Optimization
The main takeaway from this topic is the same as in Guideline #3 – for maximum efficiency, place your data in the appropriate
memory space. Although your code may vary, you should generally follow the recommendations in Table 5.
Table 5. 8051 Memory Spaces and Recommended Usage
8051 memory space
Keil compiler keywords
Usage
Internal
data, idata, bit, bdata, small
Bit variables. Automatic variables, especially those used for
complex calculations.
External, compact mode
pdata, compact
Frequently accessed variables (global, static or automatic,
depending on program).
External, large mode
xdata, large (PSoC Creator default)
Large arrays or structures. Variables accessed less often.
Note that accessing multiple pages in compact mode, by changing P2AX, is possible but is not recommended. One reason is
that an interrupt handler might use a different page than the background thread. Unless P2AX is carefully managed (for
example through push / pop operations), your code may end up accessing a different page than intended and a hard-to-find
defect may result.
Topic #9: Use All of the Resources in Your PSoC
There is one final method available for reducing code size. It is based on the fact that PSoC is designed to be a flexible device
that enables you to build custom functions in programmable analog and digital blocks. For example, in PSoC 3 you have the
following peripherals that can act as “co-processors”:

DMA Controller. Note that the most common CPU assembler instructions are MOV and MOVX, which implies that the
CPU spends a lot of cycles just moving bytes around. Let the DMA controller do that instead.


Digital Filter Block (DFB) – a sophisticated 24-bit sum of products calculator

The UDBs also have programmable logic devices (PLDs) which can be used to build state machines, c.f. the Lookup
Table (LUT) Component datasheet. LUTs can be an effective alternative to programming state machines in the CPU using
C switch / case statements.

Analog components including ADCs, DACs, comparators, opamps, as well as programmable switched capacitor /
continuous time (SC/CT) blocks from which you can create programmable gain amplifiers (PGAs), transimpedance
amplifiers (TIAs), and mixers. Consider doing your processing in the analog domain instead of the digital domain.
Universal Digital Blocks (UDBs). There are as many as 24 UDBs, and each UDB has an 8-bit datapath that can add,
subtract, and do bitwise operations, shifts, and cyclic redundancy check (CRC). The datapaths can be chained for wordwide calculations. Consider offloading CPU calculations to the datapaths.
PSoC Creator offers a large number of Components to implement various functions in these peripherals. This allows you to
develop an effective multiprocessing system in a single chip, significantly offloading functionality from the CPU. This in turn
can not only reduce code size but by reducing the number of tasks that the CPU must perform you can reduce CPU speed
and thereby reduce power.
For example, with PSoC 3 a digital system can be designed to control multiplexed ADC inputs, and interface with DMA to save
the data in SRAM, to create an advanced analog data collection system with zero usage of the CPU.
Cypress offers extensive application note support for PSoC peripherals, as well as detailed data in the device datasheets and
technical reference manuals (TRMs). For more information see the PSoC 3 home page at www.cypress.com.
www.cypress.com
Document No. 001-60630 Rev. *G
24
PSoC® 3 - 8051 Code and Memory Optimization
Summary
This application note has demonstrated that:

The 8051 CPU can be made to work very efficiently when its core internal features are used. These resources must be
used carefully because they are limited.

The efficiency gains can be realized without writing any 8051 assembler code. Keywords for the Keil 8051 C compiler
must be used; portability issues can be mitigated by the use of macros provided by PSoC Creator.

The Keil C compiler provides a number of ways to make a C program work efficiently on the 8051.
After you compile your C code, you should review the resultant assembler and understand why the particular instructions are
there. There are two ways to do that in PSoC Creator.
1.
Bring up the list file corresponding to the compiled C file (filename.lst). The default PSoC Creator project build setting is to
create a list file. To find it, in the PSoC Creator Workspace Explorer window click the Results tab.
2.
Use the disassembly window in the debugger. That window shows mixed source and assembler, which helps in
debugging. However, the disadvantage is that you must have working target hardware and a project that builds correctly
before you can use the debugger.
Note that all of the techniques described in this application note were done using compiler optimization level 3. Further gains
may be achievable by using higher levels of compiler optimization, at the cost of possible difficulties in debugging. For details,
see the PSoC Creator Help topic “Compiler Build Settings” and the Keil help topic “OPTIMIZE Compiler Directive”.
The best way to learn more about coding for the 8051 is to review the Keil C keywords, which can be found in PSoC Creator
menu Help, Documentation, Keil, Cx51 Compiler User‟s Guide, and Language Extensions.
About the Author
Name:
Mark Ainsworth
Title:
Applications Engineer Principal
Background:
Mark Ainsworth has a BS in Computer
Engineering from Syracuse University
and an MSEE from University of
Washington, as well as many years
experience designing and building
embedded systems.
www.cypress.com
Document No. 001-60630 Rev. *G
25
PSoC® 3 - 8051 Code and Memory Optimization
Document History
®
Document Title: PSoC 3 - 8051 Code and Memory Optimization - AN60630
Document Number: 001-60630
Revision
ECN
Orig. of
Change
Submission
Date
Description of Change
**
2901594
MKEA
03/30/10
New application note.
*A
3209904
MKEA
03/30/11
Changed title according new standards. Added clarifications for CY macros and
compiler optimization, and other text and code.
*B
3248324
MKEA
05/04/11
Added Advanced Topics and updated all sections.
*C
3259272
MKEA
05/17/11
Fixed PDF
*D
3275139
MKEA
06/06/11
Template Fix
*E
3535835
MKEA
02/28/2012
Updated template.
Modified Title and Abstract.
*F
3946665
MKEA
03/27/2013
Added text to the Keil 8051 Memory Models section. Added a section on function
pointers. Added keywords. Other minor text and formatting updates.
*G
4282039
MKEA
02/14/2014
Added examples of bit variable usage. Broke out separate Guideline #4 and
added Topic #9. Updated to *L template. Miscellaneous minor edits.
www.cypress.com
Document No. 001-60630 Rev. *G
26
PSoC® 3 - 8051 Code and Memory Optimization
Worldwide Sales and Design Support
Cypress maintains a worldwide network of offices, solution centers, manufacturer‟s representatives, and distributors. To find
the office closest to you, visit us at Cypress Locations.
PSoC® Solutions
Products
Automotive
cypress.com/go/automotive
psoc.cypress.com/solutions
Clocks & Buffers
cypress.com/go/clocks
PSoC 1 | PSoC 3 | PSoC 4 | PSoC 5LP
Interface
cypress.com/go/interface
Lighting & Power Control
cypress.com/go/powerpsoc
cypress.com/go/plc
Memory
cypress.com/go/memory
PSoC
cypress.com/go/psoc
Touch Sensing
cypress.com/go/touch
USB Controllers
cypress.com/go/usb
Wireless/RF
cypress.com/go/wireless
Cypress Developer Community
Community | Forums | Blogs | Video | Training
Technical Support
cypress.com/go/support
PSoC is a registered trademark and PSoC Creator is a trademark of Cypress Semiconductor Corp. All other trademarks or registered trademarks
referenced herein are the property of their respective owners.
Cypress Semiconductor
198 Champion Court
San Jose, CA 95134-1709
Phone
Fax
Website
: 408-943-2600
: 408-943-4730
: www.cypress.com
© Cypress Semiconductor Corporation, 2010-2014. The information contained herein is subject to change without notice. Cypress Semiconductor
Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in a Cypress product. Nor does it convey or imply any
license under patent or other rights. Cypress products are not warranted nor intended to be used for medical, life support, life saving, critical control or
safety applications, unless pursuant to an express written agreement with Cypress. Furthermore, Cypress does not authorize its products for use as
critical components in life-support systems where a malfunction or failure may reasonably be expected to result in significant injury to the user. The
inclusion of Cypress products in life-support systems application implies that the manufacturer assumes all risk of such use and in doing so indemnifies
Cypress against all charges.
This Source Code (software and/or firmware) is owned by Cypress Semiconductor Corporation (Cypress) and is protected by and subject to worldwide
patent protection (United States and foreign), United States copyright laws and international treaty provisions. Cypress hereby grants to licensee a
personal, non-exclusive, non-transferable license to copy, use, modify, create derivative works of, and compile the Cypress Source Code and derivative
works for the sole purpose of creating custom software and or firmware in support of licensee product to be used only in conjunction with a Cypress
integrated circuit as specified in the applicable agreement. Any reproduction, modification, translation, compilation, or representation of this Source
Code except as specified above is prohibited without the express written permission of Cypress.
Disclaimer: CYPRESS MAKES NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, WITH REGARD TO THIS MATERIAL, INCLUDING, BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Cypress reserves the
right to make changes without further notice to the materials described herein. Cypress does not assume any liability arising out of the application or
use of any product or circuit described herein. Cypress does not authorize its products for use as critical components in life-support systems where a
malfunction or failure may reasonably be expected to result in significant injury to the user. The inclusion of Cypress‟ product in a life-support systems
application implies that the manufacturer assumes all risk of such use and in doing so indemnifies Cypress against all charges.
Use may be limited by and subject to the applicable Cypress software license agreement.
www.cypress.com
Document No. 001-60630 Rev. *G
27