ETC EXCIMERLABMANUAL

Instructor’s Manual
MOTOR OLA Powe rPC
E xci m e r Labo rato ry Manual
Jose I. Quiñones, Noel Serrano, Walter Guiot, Luis Narváez, Eisen Montalvo
Department of Electrical and Computer Engineering
University of Puerto Rico-Mayagüez
Chuck Corley
PowerPC Applications Engineering
Motorola
Editor: José L. Cruz Rivera
Department of Electrical and Computer Engineering
University of Puerto Rico-Mayagüez
VOLUME I
1
DISCLAIMERS
 Motorola Inc. 1999
Portions hereof  International Business Machines Corp. 1991–1995. All rights reserved.
This document contains information on a new product under development by Motorola and IBM. Motorola and IBM reserve the right to change or discontinue this product without notice. Information in this document is provided solely t o
enable system and software implementers to use PowerPC microprocessors. There are no express or implied copyright or
patent licenses granted hereunder by Motorola or IBM to design, modify the design of, or fabricate circuits based on the
information in this document. The PowerPC 60x microprocessors embody the intellectual property of Motorola and of
IBM. However, neither Motorola nor IBM assumes any responsibility or liability as to any aspects of the performance,
operation, or other attributes of the microprocessor as marketed by the other party or by any third party. Neither Motorola
nor IBM is to be considered an agent or representative of the other, and neither has assumed, created, or granted hereby any
right or authority to the other, or to any third party, to assume or create any express or implied obligations on its behalf.
Information such as data sheets, as well as sales terms and conditions such as prices, schedules, and support, for the product may vary as between parties selling the product. Accordingly, customers wishing to learn more information about the
products as marketed by a given party should contact that party. Both Motorola and IBM reserve the right to modify this
manual and/or any of the products as described herein without further notice.
NOTHING IN THIS MANUAL, NOR IN ANY OF THE ERRATA SHEETS, DATA SHEETS, AND OTHER SUPPORTING
DOCUMENTATION, SHALL BE INTERPRETED AS THE CONVEYANCE BY MOTOROLA OR IBM OF AN EXPRESS
WARRANTY OF ANY KIND OR IMPLIED WARRANTY, REPRESENTATION, OR GUARANTEE REGARDING THE
MERCHANTABILITY OR FITNESS OF THE PRODUCTS FOR ANY PARTICULAR PURPOSE.
Neither Motorola nor IBM assumes any liability or obligation for damages of any kind arising out of the application or
use of these materials. Any warranty or other obligations as to the products described herein shall be undertaken solely b y
the marketing party to the customer, under a separate sale agreement between the marketing party and the customer. In the
absence of such an agreement, no liability is assumed by Motorola, IBM, or the marketing party for any damages, actual or
otherwise. “Typical” parameters can and do vary in different applications. All operating parameters, including “Typicals,”
must be validated for each customer application by customer’s technical experts. Neither Motorola nor IBM convey any
license under their respective intellectual property rights nor the rights of others. Neither Motorola nor IBM makes any
claim, warranty, or representation, express or implied, that the products described in this manual are designed, intended, or
authorized for use as components in systems intended for surgical implant into the body, or other applications intended
to support or sustain life, or for any other application in which the failure of the product could create a situation where
personal injury or death may occur. Should customer purchase or use the products for any such unintended or unauthorized application, customer shall indemnify and hold Motorola and IBM and their respective officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees
arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that Motorola or IBM was negligent regarding the design or manufacture of the part.
Motorola and are registered trademarks of Motorola, Inc. Motorola, Inc. is an Equal Opportunity/Affirmative Action Employer. IBM and IBM logo are registered trademarks, and IBM Microelectronics is a trademark of International Business
Machines Corp. The PowerPC name, PowerPC logotype, and PowerPC 601 are trademarks of International Business Machines Corp. used by Motorola under license from International Business Machines Corp. International Business Machines Corp. is an Equal Opportunity/Affirmative Action Employer.
2
INTRODUCTION
This laboratory manual contains 13 lab experiments for the PowerPC Excimer Board presented in
an increasing order of complexity. The experiments range from memory mapping problems and
system benchmarking to integer to floating point number representation conversion. It is assumed that the student has a basic understanding of C and assembly languages. There is a natural
progression in the lab experiments leading up to the Dhrystone and Linpack benchmarking of the
PowerPC603e that forms the basis of the Excimer board. Specifically, the experiments guide the
student through the following topics: code compilation, code download, DINK functions (resident monitor program), keyboard input, assembly language programming, and linking assembly
language to C code. There are also experiments on memory mapping and Flash ROM programming.
Each lab experiment is structured as follows: Problem Statement, Objectives, Background Information, Procedure, Questions, and References. The Problem Statement provides a brief indication as to the tasks that will be performed. The Objectives section presents the specific educational objectives that will be met upon successful completion of the lab experiment. The Background information section presents a brief description of the theory behind the devices, instructions, functional units, and/or methods to be followed in the conduction of the experiment. The
Procedure section presents a step-by-step guide to the experiment. The Questions section seeks
to guide the student through a meaningful analysis of what he/she has performed as part of the
experiment. Finally, the References section presents additional references with material that is
useful for the experiment at hand. In addition to these sections, the Instructor’s Manual contains
a Results and a Troubleshooting section.
3
This laboratory manual contains experiments designed to familiarize students with the PowerPC
architecture via the Excimer Laboratory Board. The lab manual is not meant to serve as a standalone textbook on the PowerPC instruction set architecture (ISA), but rather is designed as a
companion to any PowerPC book or technical reference. Each experiment is designed so that
students will end up with a significant number of useful subroutines that can be used in other
more complex programming problems. Additional references to the PowerPC architecture and
the Excimer board may be found at http://www.motorola.com/SPS/PowerPC/teksupport.
4
CONTENTS
Experiment #1: Metaware Tutorial
Write and compile a simple C program.
Experiment #2: DINK Tutorial
Download the program to Excimer and use some utilities.
Experiment #3: Useful DINK Functions
Write a program that will get input from KB and echo to display. Discuss various utilities of interest.
Experiment #4: Excimer Memory Map
Compile, download, and execute a C program which blinks the on-board LEDs
Experiment #5: LED Control from PC Keyboard
Write and debug a C program to turn the on-board LEDs on and off for varying integer counts.
Experiment #6: A Simple Scanf Function for Excimer
Develop a C function for taking character input from the terminal emulator’s keyboard attached to Excimer
through the serial port and converting number characters to decimal values used in other programs.
Experiment #7: Introduction to Assembly Language Programming
Write a simple assembly language program.
Experiment #8: Linking Assembly Language and C code
Link previous code fragments.
Experiment #9: Converting Integers to Floating Point
Develop an assembly language subroutine to convert the 64 bit integer value read from the PowerPC time
base facility to a 64 bit (double) floating point number representing seconds. (Contributed by Chuck Corley, Motorola)
Experiment #10: Dhrystone Benchmarking
Write and debug a C program to count the integer number of cycles required to execute the Dhrystone
benchmark.
Experiment #11: Linpack Benchmarking
Write and debug a C program to time in microseconds (floating point) the execution of the Linpack benchmark.
Experiment #12: Cache Impact on Benchmark Metrics
Write a single program to time the performance of Dhrystone and Linpack with the caches enabled and disabled.
Experiment #13: Flash ROM
Write a program that copies itself into Flash ROM and begins executing from there.
6
10
14
18
23
26
32
41
52
63
72
76
80
5
Experiment
1
Metaware Tutorial
Problem Statement:
•
In this experiment the student will develop and compile a C program that will calculate the
first 12 Fibonacci Numbers using the Metaware PowerPC compiler. (Contributed by Noel
Serrano).
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
write, debug and compile a C program using the Metaware and Code Warrior compilers
•
write a recursive function that will generate the first 12 Fibonacci Numbers
Background Information:
This experiment is designed to take you through the major steps required to implement a simple
algorithm for the generation of the first 12 Fibonacci numbers using the Metaware compilers for
the Excimer board. The Metaware compiler facilitates code writing, debugging, and optimization.
More information on the compiler may be obtained from www.metaware.com.
The Fibonacci sequence represents a series that has as its first two elements 0 and 1. The remaining elements can be obtained by simply adding the last two numbers to get the next. For example, the first 12 Fibonacci Numbers (the first element in the sequence, 0, is not included) are:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
6
The Fibonacci numbers arose from the solution to the following problem posed in the year 1225:
Suppose we have one pair of rabbits that can produce another pair of productive offspring when
they reach the age of 1 month and that each successive pair of offspring can do the same. Furthermore, assume the rabbits never die. How many rabbits will there be after n months? The
solution is as follows: If after n months there are kn pairs of rabbits, the number of pairs in month
n+ 1 will be kn plus the number of new pairs born. However, since new pairs are only born to
pairs at least 1 month old, there will be kn-1 new pairs, that is kn+1 = kn + kn-1, which is simply
the rule for generating the Fibonacci Numbers. More information on the fascinating world of Fibonacci
Numbers
and
their
applications
can
be
found
in
http://pass.maths.org.uk/issue3/fibonacci/index.html.
Procedure:
1. Write a C language program that will calculate the first 12 Fibonacci numbers.
Hint: Use a recursive function.
2.
To be able to print the numbers in the DINK32 interface (this will be discussed in more
detail in future experiments) you will need to add the following code to your program to
redefine the default printf function with the one provided by DINK. Also the printf
function should contain only one variable.
#define printf dink_printf
unsigned long (*dink_printf)() = (unsigned long (*)())) 0x6270;
3. Type your program to a text file using notepad or edit and save it in the directory you
have chosen to contain your code.
4. Compile the C code with the hcppc command included with the Metaware C compiler
using the following command on the DOS command prompt :
hcppc –Hppc603 –c file.c
7
Note: “file.c” stands for the C code file. You may name your C code file as you wish, but
remember to use the chosen name in the hcppc command. The result from this command
will be “file.o”, which is the object file. For more information about the options of the
compiler type hcppc -h.
5. Link the object files using ldppcl command to invoke the linker program included with
Metaware C compiler using the following command on the DOS Command Prompt:
ldppc –B start_addr=70000 –xm file.o
Note: “file.o” is the object file generated in the last step. The object file will be named exactly as
you named the C code file.) The result from this step will be the file “a.hex”, the file that will be
later downloaded to the Excimer board. The –B start_addr=70000 is an option that specifies where does your code is going to be paced in the memory of the Excimer Board. For more
information about the linker type ldppc -h.
References:
•
Metaware High C/C++ Compiler – http://www.metaware.com
Suggested Code:
int fibonacci(int x);
#define
printf dink_printf
unsigned long (*dink_printf)() = (unsigned long (*)()) 0x6270;
main()
{
int fib_no = 0, index = 0;
while (index < 12)
{
fib_no = fibonacci(index);
printf("Fibonacci number for index %d is", index);
8
printf(" %d\n", fib_no);
index++;
}
return 0;
}
int fibonacci(int number)
{
switch (number)
{
case 0 :
return 1;
case 1 :
return 1;
default :
return (fibonacci(number-1) + fibonacci(number-2));
}
}
Troubleshooting:
If the student is not able to:
•
Print to the DINK 32 interface: verify that the address of the pointer address matches that of
the DINK version in use. This can be done through an st command in DINK and verifying
that the address for printf matches the one provided in this manual.
9
Experiment
2
DINK Tutorial
Problem Statement:
•
This experiment is designed to introduce the student to the DINK interface. A tutorial on
how to download code to the Excimer board and some useful DINK debugging utilities are
also presented. (Contributed by Noel Serrano).
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
download their programs to the Excimer board using the DINK interface.
•
debug the programs using the DINK built-in debugging tools.
Background Information:
The Excimer board contains a debugging interface called DINK. This interface enables you to
connect to the evaluation board through a serial cable using a terminal program. This enables the
developer to have continuous communication with the evaluation board, allowing insight into the
board’s state at all time. The terminal screen of your program should look like this.
10
Figure 1. DINK32 on terminal client
More
information
on
DINK
can
be
found
at
www.mot.com/SPS/PowerPC/teksupport/teklibrary/index.html.
Procedure:
1. First make sure you have your evaluation board connected using the serial cable provided to
serial port 1 (COM1). Open your terminal client and configure it to connect through COM1
using the following parameters.
Parameter
Protocol
Port
Baud Rate
Data Bits
Parity
Stop Bits
RTS/CTS
Value
Serial
COM1
9600
8
N
1
Enabled
Turn on the evaluation board and press Connect on your terminal client. You should be able
to see the initialization window with the DINK32_603e >> prompt as presented in the figure
shown below (see step 4).
2. Compile the program you created on experiment number one using the Metaware compiler.
11
3. Now you are ready to download your program to the Excimer board for execution. To do so
first go to the terminal client running the DINK32 interface and type dl –k. This command
will expect to receive data from the keyboard serial port (COM1). Now proceed to send the
file from the terminal client. This can be done by selecting a command like Send Text File or
Send ASCII (this can vary from one terminal client to the other). Now browse for the a.hex
created in the directory where you compiled your program.
4. Run your program by typing go 70000 in the DINK32 program. If your code is correct
and if you have been successful in downloading the code you should get an output like the
following.
Hint: The table below presents some useful commands in case you need to debug your program,
view memory or register contents, and/or ser breakpoints for program tracing. For more information type help <command> in the DINK prompt.
Command
Memory Display
Registry Display
Disassemble
Trace
Format
md <addr>
regdisp rx
ds <addr>
tr <addr>
Breakpoint
Assemble
br <addr>
as <addr>
Description
Displays the memory area specified by the hex address
Displays the register specified by rx
Disassemble the code starting at the specified address location
Begin tracing a program at the specified address. To continue
tracing type tr +.
Sets a breakpoint at the specified address
Provides you with the option of changing part of the assembly
from the DINK Interface accessing it through the address of
the code line.
12
References:
[1] Motorola, Designing a Minimal PowerPC System, PowerPC Application Note: AN1769/D,
1998.
Conclusions:
Students should be able to note that:
•
DINK works similarly to other evaluation board environments
•
DINK provides functionality that enables the user to modify the memory, registers, and assembly code
•
DINK provides breakpoint and trace capabilities for debugging purposes
Troubleshooting:
If the student is not able to communicate with the DINK:
•
verify the connections to COM1 port and board
•
check for correct settings on terminal client
13
Experiment
3
Useful DINK Functions
Problem Statement:
•
In this experiment the student is introduced to a set of useful functions that are contained
within the DINK 32 Interface. (Contributed by Noel Serrano).
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
work with more advanced DINK functions and use them on future laboratories.
Background Information:
The DINK 32 interface provides a set of functions that facilitate the development of programs
for the Excimer board. Among the functions included in DINK are some that allow the programmer to capture data from the keyboard and to print to the screen. Other functions control parts
of the Excimer board configuration like enabling the timer, cache, etc.
This laboratory will give you and overview of a basic set of these functions and will teach you
how to access them in your C programs. There is a command included in DINK that display a
list of all these functions with their branch labels and corresponding addresses. The command is
st and the corresponding output will look like this.
DINK32_603e >>st
14
Current list of DINK branch labels:
KEYBOARD:
0x0
get_char:
0x1e4c4
write_char:
0x5eb4
TBaseInit:
0x39e0
TBaseReadLower:
0x3a04
TBaseReadUpper:
0x3a20
CacheInhibit:
0x3a3c
InvEnL1Dcache:
0x3a5c
DisL1Dcache:
0x3aa4
InvEnL1Icache:
0x3ac8
DisL1Icache:
0x3b00
BurstMode:
0x3bfc
RamInCBk:
0x3c3c
RamInWThru:
0x3c7c
dink_loop:
0x55e8
dink_printf:
0x6270
Current list of USER branch labels:
DINK32_603e >>
All these functions can be accessed through your C code by casting a function that will point to
the address in DINK. The code that defines the function would look like the following example
for the printf function.
#define
printf dink_printf
unsigned long (*dink_printf)() = (unsigned long (*)()) 0x6270;
In the following section we present examples of three DINK functions.
a) get_char – This function enables the programmer to capture characters from the keyboard
through the DINK interface. The get_cahr function can be accessed by using the following code:
#define getchar dink_get_char
unsigned long (*dink_get_char)() = (unsigned long (*)()) 0x1e4c4;
This will enable you to capture characters from the keyboard. The syntax for reading character
from the keyboard would be:
char LED;
LED = getchar();
15
b) write_char – This function enables the programmer to display characters on the terminal
screen that is running DINK. The write_char function can be accesed by using the following
code.
#define writechar dink_write_char
unsigned long (*dink_write_char)() = (unsigned long (*)())0x5eb4;
This will enable you to output characters to the screen. The sysntax for displaying single characters from the keyboard would be:
char LED = ‘N’;
LED = writechar(LED);
c) dink_printf –This functions provide the programmer the option of displaying a string of characters on the DINK interface and also provide the user the ability of including a runtime variable,
either char or integer, on this string. It is done by using the dink_printf function using the same
syntax as in C.
#define printf dink_printf
unsigned long (*dink_printfr)() = (unsigned long (*)())0x6270;
This will enable you to print any message on the DINK and also include any of the variables included in your code. The DINK printf function can only include one variable per statement
not like in C which it can contain any number of variables.
printf("Fibonacci number for index %d is", index);
There are other important functions that can be used to control many aspects of the Excimer
board. These are briefly described in the table below and explained in details in the included
DINK manuals.
Functions
TBaseInit
TBaseReadLower
Address
0x39e0
0x3a04
Description
Initializes the time base register
Reads the lower half of the time base register
16
TBaseReadUpper
CacheInhibit
InvEnL1Dcache
DisL1Dcache
InvEnL1Icache
DisL1Icache
BurstMode
0x3a20
0x3a3c
0x3a5c
0x3aa4
0x3ac8
0x3b00
0x3bfc
Reads the upper half of the time base register.
Turns off the caches.
Invalidate and Enable the L1 data cache.
Disable L1 data cache.
Invalidate and Enable the L1 instruction cache.
Disable L1 instruction cache.
Sets up burst mode.
References:
[1] Motorola, Designing a Minimal PowerPC System, PowerPC Application Note: AN1769/D,
1998.
Suggested Code:
/* This section of code can be used to define any of the DINK functions in a C language program.
The user will only need to modyfy the address and function name. */
#define
function_name dink_function_name
unsigned long (*dink_function_name)() = (unsigned long (*)()) hex_addr;
Troubleshooting:
If the student is not able to access the DINK functions:
•
verify the casting is correct.
•
verify that he/she is using the correct hex address.
17
Experiment
4
Excimer Memory Map
Problem Statement:
•
This experiment requires the compilation, downloading, and execution of a C language program which blinks the Excimer Board’s STATUS and ERROR Light Emitting Diodes (LEDs).
(Contributed by Noel Serrano, José I. Quiñones, Luis Naváez, Walter Guiot, and Gunther
Costas).
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
write and compile a C program
•
download and execute PowerPC Assembly object code
•
locate the LEDs within the Excimer’s memory map
•
apply the methodology needed to turn on and off the LEDs
Background Information:
The PowerPC family of microprocessors is based on a memory mapped input/output scheme.
Under this scheme, an input port can be thought of as read-only memory location, while an output port can be treated like a write-only memory location. The microprocessor’s address bus is
used to select the peripheral device (port location), the data bus is used to transmit or receive
data to/from the device, and the Transfer Type signals are used to convey the directionality of
the information transfer.
18
The memory map for the Excimer Board is shown in Figure 2. The memory map indicates that
out of a total of 232 = 4GB addressable locations, the Excimer Board allocates 230 = 1GB each to
Static RAM, Fast I/O devices, Slow I/O devices and Flash ROM [1]. Of course, the Excimer
board only uses a fraction of the memory locations allocated for each type of memory and devices. The Excimer Board is configured with 512 KBs of SRAM, 4MBs of Flash ROM, and
some LED indicators. For example, there’s a STATUS LED located at 0x40200000, while an
ERROR LED is specified at 0x40600000.
STATIC RAM
0x0000_0000 → 0x3FFF_FFFF
FAST I/O
0x4000_0000 → 0x7FFF_FFFF
⇒STATUS LED: 0x4020_0000
⇒ERROR LED: 0x4060_0000
SLOW I/O
0x8000_0000 → 0xBFFF_FFFF
FLASH ROM
0xC000_0000 → 0xFFFF_FFFF
Figure 1: Excimer's Memory Map.
In this experiment you are required to write a C program that will blink (repeatedly turn on and
off) the STATUS and ERROR LEDs alternatively. The LEDs are turned on/off by clearing/setting BIT 3 (fourth least significant bit) of these locations. The reason for this negative
logic is that the LEDs are connected in a common anode configuration, as shown in Figure 2 for
the case of a seven segment LED display.
19
Figure 2: Common Anode LED configuration. LEDs will turn ON when the cathode is at ground
level (Excimer Output asserted low)
To successfully blink an LED, you must carefully select the delay timing. Remember that the microprocessor may turn the LED on and off so quickly that you will not see the blinking effect.
Since your program will be written in C, a simple “for” loop instruction may do the job.
For (counter=0;counter <= parameter; counter++);
Note: counter must be declared as unsigned long in the program. The value parameter will define
the delay time.
There are other ways to accomplish a time delay, for example using the PowerPC’s internal timer
register. These techniques will be demonstrated in the successive experiments.
Procedure:
1. Write a simple C code that alternatively blinks the Status and Error LEDs ten times.
2. Compile the C code with the hcppc command included with the Metaware C compiler using
the following command on the DOS command prompt :
hcppc –Hppc603 –c file.c
20
Note: “File.c” stands for the C code file. You may name your C code file as you wish, but
remember to use the chosen name in the hcppc command. The result from this command
will be “file.o”, which is the object file.
3. Link the object files using ldppcl command to invoke the linker program included with
Metaware C compiler using the following command on the DOS Command Prompt:
ldppc –B start_addr=70000 –xm file.o
Note: “File.o” is the object file generated in the last step. The object file will be named exactly as you named the C code file.) The result from this step will be the file “a.hex”.
4. Run the DINK32 application on your Windows 95 or NT terminal. Download the “a.hex”
file, which resulted from the last step. To do so, write DL –k on the DINK monitor. On the
terminal it will appear, “Set to Keyboard Port”. Press Transfer->Send Text File on
the communication terminal’s menu. Find your “a.hex” file and select it. The file will be
downloaded to the Excimer board.
5. Execute the program by writing “go 70000” on the terminal.
6. Observe the behavior of the on-board LED’s. What happens if you decrease/increase the
value of parameter in your FOR loop statement?
References:
[1] Motorola, Designing a Minimal PowerPC System, PowerPC Application Note: AN1769/D,
1998.
Suggested Code:
/* This program will blink the status and Error LEDs alternatively
ten times. After that, both LEDs will be shut down. 0xfffff will
cause a visible delay in a 300MHz PowerPC*/
21
Main ()
{
unsigned long count;
int loop;
for (loop = 0 ; loop <= 10; loop++)
{
*(char *) (0x40200000) = 0x00; //turn on status
*(char *) (0x40600000) = 0x08; //turn off error
for(count = 0; count <= 0xfffff; count ++);
*(char *) (0x40200000) = 0x08; //turn off status
*(char *) (0x40600000) = 0x00; //turn on error
for(count = 0; count <= 0x1fffff; count ++);
}
*(char *) (0x40600000) = 0x08;
}
Conclusions:
Students should be able to note that:
•
The speed, which drives the PowerPC microprocessor, is very fast and thus a blinking effect
might not be perceived.
•
For different loop parameters, the LED will remain ON or OFF for a different time period.
•
The LED’s are configured as Common Anode (negative terminal connected together).
Troubleshooting:
If the student is not able to turn ON or OFF the LED check that:
•
the address being written to is either 0x40600000 or 0x40200000.
•
a suitable value for the time delay loop has been defined.
•
the student has compiled, linked and downloaded the program correctly.
22
Experiment
5
LED Control from Keyboard
Problem Statement:
•
This experiment requires the compilation, downloading, and execution of a C language program which blinks the Excimer Board’s ERROR Light Emitting Diode (LEDs) the number of
times specified by the user input. (Contributed by Noel Serrano and José I. Quiñones).
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
use the DINK functions presented in experiment #3
•
print to the DINK 32 interface
•
capture single characters from the keyboard and echo them to the DINK 32 interface
Procedure:
1. Write a C program that will blink the on board LED’s based on user input. The program
should ask the user which LED he wants to blink and how many times.
Hint: To create this program use the program you created in the previous experiment and the
Useful DINK Functions.
References:
[1] Motorola, Designing a Minimal PowerPC System, PowerPC Application Note: AN1769/D,
1998.
23
Suggested Code:
#include <stdio.h>
#define
#define
#define
getchar dink_get_char
putchar dink_write_char
printf dink_printf
void blink_leds(int addr, int i);
unsigned long (*dink_get_char)() = (unsigned long (*)()) 0x1e4c4;
unsigned long (*dink_write_char)(char) = (unsigned long (*)(char)) 0x5eb4;
unsigned long (*dink_printf)() = (unsigned long (*)()) 0x6270;
main()
{
int decimal_no;
char LED;
char number;
do
{
printf ("\nSelect the LED you want to blink:\n");
printf ("\tS - Press S for the Status LED\n");
printf ("\tE - Press E for the Error LED\n");
printf ("\tQ - Press Q to Quit\n");
LED = getchar();
/* Read typed Character */
if (LED == 'E' || LED == 'e')
{
printf ("\nEnter the number of times (1-9) to
ror LED: ");
do{ /* is it a number??? */
number = getchar();
}while ( !((number >= '0') && (number <= '9'))
putchar(number);
/* echo typed character */
decimal_no = number - 48;
blink_leds(0x40600000, decimal_no);
}
else if (LED == 'S' || LED == 's')
{
printf ("\nEnter the number of times (1-9)
Status LED: ");
do{
number = getchar();
}while ( !((number >= '0') && (number <= '9'))
putchar(number);
decimal_no = number -48;
blink_leds(0x40200000, decimal_no);
}
} while ( LED != 'Q' && LED != 'q' );
/* X or x */
return 0;
blink the Er-
);
to
blink the
);
}
24
void blink_leds(int addr, int i)
{
unsigned long count;
int loop;
for (loop = 0 ; loop < i; loop++)
{
*(char *) (addr) = 0x00;
//turn on error
for(count = 0; count <= 0xfff00; count ++);
*(char *) (addr) = 0x08;
//turn off error
for(count = 0; count <= 0xfff00; count ++);
}
*(char *) (0x40600000) = 0x08;
}
Conclusions:
Students should be able to note that:
•
a PowerPC Excimer Board program can obtain data from a user via the Keyboard.
•
the getchar function is not useful in cases you need more than character as input, so an implementation of a scanf function would be useful.
Troubleshooting:
If the student is not able to:
•
Access DINK 32 interface functions: use the st command to verify that the address for the
DINK functions matches the ones provided in this manual.
•
Blink the LEDs: verify the memory mapping for each of the LEDs.
25
Experiment
6
A simple scanf function for Excimer
Problem Statement:
•
In this experiment the student will develop a C function for taking character input from the terminal
emulator’s keyboard attached to Excimer through the serial port and converting number characters
to decimal values used in other programs. (Contributed by Chuck Corley, Motorola)
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
substitute the getchar and putchar equivalent functions available in DINK for the same functions
normally found in <stdio.h>
•
recognize the ASCII character values returned from getchar() and echo them back via putchar()
•
convert digit characters input through the keyboard into decimal integer values for use in other programs
•
utilize DINK’s print output function to display the resulting decimal value
Background Information:
Texts on programming describe how to get input for a program. For example, The Waite Group’s New
C Primer Plus [1] says:
The C library contains several input functions, and scanf() is the most general of them,
for it can read a variety of formats. Of course, input for the keyboard is text because the
keys generate text characters: letters, digits, and punctuation. When you desire to enter,
say, the integer 2002, you type the characters 2 0 0 and 2. If you want to store that as a
26
numerical value rather than as a string, your program has to convert the string characterby-character to a numerical value. And that is what scanf() does! It converts a string input into various forms: integers floating-point numbers, characters, and C strings.
It is the inverse of printf(), which converts integers, floating-point numbers, characters,
and C strings to text that is to be displayed on the screen. Like printf(), scanf() uses a
control string followed by a list of arguments. The control string indicates into which
formats the input is to be converted.
The DINK software on Excimer provides input and output functions that save the programmer from
having to interact directly with the duart that receives input and sends output to the terminal. However these functions are not at the level of a complex function like scanf(). Nevertherless, many of the
C programs that we desire to run on Excimer call the scanf() function because of it’s widespread use.
In this experiment, you will write your own function my_scanf() and substitute it (by a #define directive) for any scanf() function that the compiler may encounter in programs intended for Excimer.
Likewise you will define dink_printf() to substitute for printf() and link dink_printf() into your programs. Then you will have input and output functions for use in other programs.
To keep my_scanf() simple we will assume that the only control string for converting inputs is the %d
or decimal format. Your my_scanf() function should accept a control string as an argument but then
ignore it and return a decimal value to the second (and last) argument in the functional call. Later experiments may require more sophisticated substitute functions for scanf(), but this simple decimal input routine will be widely applicable.
Eximer’s dink_printf() does accept a control string but it ignores floating-point and character formats.
It will only print decimal numbers (%d), hexadecimal numbers (%x), and strings (%s) and then only one
such format per printf statement.
27
Procedure:
1. Write a C language program which asks the user to input a number through the keyboard and then
outputs the number input as a positive decimal number.
2.
In a separate file write a C program my_scanf(char, int) which reads characters from the keyboard,
echoes those that are digits, and at the carriage return assigns a decimal value to the second argument
of my_scanf().
Hint: While ignoring non-digit characters may be an acceptable simplification, you may want to check
for backspace or delete characters and take the appropriate action if the user attempts to correct his
numerical input.
3. Write a header file which equates the function name scanf to my_scanf and printf to dink_printf. In
the header file equate dink_printf to the address where it is stored in RAM as revealed by DINK’s
symtab (symbol table) command. Include this header file in your test program.
Example:
/*
*
*
*
*
*
*
*
*/
File - support.h
Equates functions used in Excimer Exercise to equivalent
functions defined in DINK or in my_scanf.c
NOTE: If DINK function addresses change because DINK changes,
addresses here must be changed accordingly.
Modification history:
19Jan99,CJC Original
#define
#define
printf dink_printf
scanf my_scanf
extern void my_scanf(const char *, ...);
unsigned long (*dink_printf)() = (unsigned long (*)()) 0x6368;
4. Your my_scanf() function will use getchar() and putchar(). Write a header file equates these to
DINK’s get_char and write_char. Write a header file which equates dink_get_char() and
dink_write_char() to the addresses where they are stored in RAM as revealed by DINK’s symtab
(symbol table) command. . Include this header file in the my_scanf.c program.
Example:
/*
*
*
*
*
*
File - excimer.h
Provides the addresses of functions defined in DINK on the Excimer
board and used by programs. The addresses of the functions are
taken from the xref.txt file generated by the linker.
When a new version of DINK is downloaded to the target, make sure
the functions' addresses are changed accordingly to match with the
28
*
*
*
*
*
*/
new addresses being generated.
Modification history:
21Oct98,My Created for ExcDemo
19Jan99,CJC Modified to run with my_scanf code.
#define
#define
getchar dink_get_char
putchar dink_write_char
/* Addresses of DINK functions. */
unsigned long (*dink_get_char)() = (unsigned long (*)()) 0x1e5e4;
unsigned long (*dink_write_char)(char) = (unsigned long (*)(char)) 0x5fac;
5. Link your input/output test program and my_scanf program.
6. Download the resulting S-record file to Excimer, execute it, confirm that it echoes only digit characters and returns the correct decimal value to your program at the carriage return.
References:
[1] The Waite Group’s New C Primer Plus (1990: Howard W. Sams & Co, Carmel, IN)
Suggested Code:
/*
*
*
*
*
*/
file "testscanf.c"
A test harness for Excimer Experiment to prove out
my_scanf() function.
Modification History:
990121 CJC Original
#include "support.h"
void main(void)
{
int
decimal_no;
printf ("Enter a decimal number: ");
scanf("%d", &decimal_no);
printf ("\nDecimal number is: %d \n", decimal_no);
/* file "support.c"
* Defines an alternative to the scanf function provided by
* stdio.h for use when running the Dhrystone benchmarks on DINK.
* Created: 990119
CJC
* Modified:
*/
#include "excimer.h"
void my_scanf(char *fmt, int *v)
{
char
ch;
int
no_runs = 0;
while ((ch = getchar()) != 0xd)
/* Carriage return? */
{
if ( (ch == 0x7f) || (ch == 0x8)) /* Delete? */
29
{
putchar(0x8);
/* Backspace */
putchar(0x20);
/* Overprint a space. */
putchar(0x8);
/* Backspace */
/* Assume modulo arithmetic to subtract last digit added. */
no_runs = no_runs / 10;
} else
if ( (ch >= '0') && (ch <= '9')) /* A digit? */
{
putchar(ch);
/* Echo it and */
/* Accumulate the value. */
no_runs = (no_runs * 10) + (ch - 48);
/* ASCII character - 48 equals the digit. */
}
}
*v = no_runs;
/* Assign second Arg the value. */
}
/* file “makefile” */
SUPPORT =
SUPPORTOPT =
OPTLEV = -O1
CPU
= 603
TARGFLAGS
= -Hppc$(CPU)
CC = c:\sw\metaware\hcppc\bin\hcppc -Ic:\sw\metaware\hcppc\inc \
-Hnocopyr -c -nofsoft $(OPTLEV) $(TARGFLAGS)
AS = c:\sw\metaware\hcppc\bin\asppc -c -big_si
LKOPT = -Bbase=0x70000 -xm -e main -Bnoheader -Bhardalign \
-xo=$(@) -q -Qn -Cglobals -Csections -Csymbols -Ccrossref \
> $(@D)\xref.txt
LINK
= c:\sw\metaware\hcppc\bin\ldppc $(LKOPT)
testscanf.src: testscanf.o my_scanf.o
$(LINK) testscanf.o my_scanf.o \
c:\sw\metaware\hcppc\lib\be\fp\libmw.a
testscanf.o: testscanf.c
$(CC) testscanf.c -o testscanf.o $(SUPPORTOPT)
my_scanf.o: my_scanf.c
$(CC) my_scanf.c -o my_scanf.o $(SUPPORTOPT)
Conclusions:
Students should be able to note that:
•
Characters are received from the keyboard as bytes of ASCII encoded information.
•
Input/Output functions normally available in standard C libraries for a given computer may not be
available or may exist in different, simpler forms on a small, embedded evaluation system like Excimer .
•
Programmers can write their own input/output routines or link in routines that are provided in the
embedded system.
30
•
Hard-coding addresses of embedded routines is a dangerous way of linking code if the routines are
relocated by DINK revisions.
Troubleshooting:
If the student is not able to:
•
Get started. Suggest that the student develop and debug the C program on the host computer by
including <stdio.h> before substituting the DINK routines and downloading to Excimer. This
should clarify the ASCII encoding of digits and conversion to a decimal number.
•
Recognize the carriage return character. Eximer will be in a continous loop of accepting and echoing
input. Additional printf() statements which output each character as it is read in will reveal the
value provided by the duart for the carriage return character.Troubleshooting:
31
Experiment
7
Introduction to Assembly Language Programming
Problem Statement:
•
In this experiment the student is introduced to the PowerPC instruction set architecture through the
development of an assemblylLanguage routine. (Contributed by Eisen Montalvo-Ruiz)
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
write and compile an assembly language subroutine
•
use Metaware Assembler directives
•
understand the instruction set and the register set of the PowerPC
Background Information:
•
PowerPC Register Set
The PowerPC architecture has two levels of privileges, the user mode, and the supervisor mode. In
the supervisor mode all registers are available to the programmer, while in the user mode only a
subset of the registers are available. We are going to focus on the user mode for this laboratory.
Tin the user mode the available PowerPC registers include 32 General Purpose Registers (GPRs),
32 Floating-Point Registers (FPRs), a Condition Register (CR), a Floating-Point Status and
Condition Register (FPSCR), the XER register, the Link Register (LR) and the Count Register
32
(CTR). In addition, there are two read-only registers, associated with the Time Base Facility (TBU
and TBL).
The GPRs are used to manipulate integer data. They come in two sizes, according to the
implementation of processor. 32-bit GPRs for the 32-bit PowerPC and 64-bits for the 64-bit
PowerPC. They are used as source and destination registers in the integer instructions.
The FPRs are used with floating-point instructions. They are 64 bits wide independently of the
implementation, and can manipulate single- and double floating-point data. Related to these
registers is the FPSCR. It contains all floating-point exception signal bits, excluding summary bits,
exception summary bits, exception enable bits, and rounding control bits.
The CR is a 32-bit register, divided into eight 4-bit fields. This register contains the results of
certain arithmetic operations and provides a way for testing and branching. The XER register
indicates overflows and carry conditions for integer operations. The LR register and the CTR
register are like the GPRs, their size depends on the implementation. The LR supplies the branch
target address for the Branch Conditional to Link Registers instructions. The CTR holds a loop
count that can be decremented during execution of appropriately coded branch instructions.
The Time Base Facility consists of a 64-bit register, divided in two 32-bit registers, Time Base
Upper (TBU) and Time Base Lower (TBL). These registers will be used in a future laboratory,
where you will learn more about them.
•
PowerPC Instruction Set
The PowerPC Instruction Set is very powerful and extensive. It contains around 200 instructions,
excluding suffices. We don’t have the space to cover all of them. For now, we are going to work
with the Integer Arithmetic, Load and Store, and Flow Control instructions. A general description
33
of the format of the instructions will be given. More information can be obtained from the PowerPC
programming references.
Integer Instruction Set
(a) Integer Arithmetic Instructions
You can add, subtract, multiply, and divide integer numbers. You can use immediate values and
registers. Also, register to register instructions are available. A general description of the format
of the instructions follows.
1. Immediate Values
opcode rD, rA, SIMM
- where rD is the destination register, rA is the source register and
SIMM is a Signed Immediate value.
2. Register to Register
opcode rD, rA, rB
- where rD is the destination register and rA and rB are the source
registers.
(b) Integer Compare Instructions
These instructions can be used in conjunction with the branch instructions to control the flow
of a program. They affect the CR, such that the branch instructions can choose their target
address based on what happened in the previous instruction. Of course, they could be used
only for comparing.
1. Immediate Values
opcode rA, SIMM
- where rA is the register you want to compare to a Signed
Immediate value
2. Register to Register
opcode rA, rB
- where rA is the register you want to compare to register
34
rB
Load and Store Instruction Set
Load and Store instructions allow data movement between memory and register locations. They
have three addressing modes. In anyone of them, if you use r0, the address calculation will use
zero instead of the value in rA.
(a) Register Indirect with Immediate Index Addressing
opcode rD, SIMM(rA)
- if loading then rD is the destination register. It will contain the value that is stored in the memory address that is
the sum of SIMM and the value in the register rA.
If
storing then the memory address that is the sum of SIMM
with the value in register rA, will contain the value stored
in register rD.
(b) Register Indirect with Index Addressing
opcode rD, rA, rB
- if loading then rD is the destination register. It will contain
the value that is stored in the memory address that is the sum of
the value in register rA and the value in the register rB. If storing
then the memory address that is the sum of the value in register
rA with the value in register rB, will contain the value stored in
register rD.
(c) Register Indirect Addressing
opcode rD, rA
- if loading then rD is the destination register. It will contain the
value that is stored in the memory address that is the value in the
register rA. If storing then the memory address that is the value
in register rA will contain the value stored in register rD.
35
Branch Instructions Set
These instructions are commonly used with compare instructions. You place the branch after the compare, using the result of it to make the decision.
opcode label
- where label is the address of the code where you want to
branch to. The assembler takes care of translating the label to
the address.
•
Metaware Assembler Directives
The assembler directives are instructions to the assembler on how to configure data and where to
put the code and data in memory. The most useful are:
(a) .text – identifies where the code section starts.
(b) .data – mark the start of the data section
(c) .word <value> – reserves space for a word in memory
(d) .org <address> – starting address of the following code and/or data
(e) .global <label> – makes this routine a public one.
You can put comments in any line, but they must begin with a “!”. In addition, you can use labels
for branching. They must end with a semicolon and must be at the beginning of the line, with or
without code in the same line.
•
Metaware Assembly Compilation
For compiling your code using Metaware, you must go through two steps. First compile the code
using asppc, the Metaware Assembler.
asppc –o filename.o filename.s
The extension of yo
36
ur file must be *.s. In this way the Assembler recognizes the file. The –o option tells the assembler
the name of the object file. If you don’t use it, the default name is the same as the code file with *.o
as the extension.
The second step is to convert the object to Motorola S3 record and to set again the address of the
code and data section.
elf2hex –p .text:0x70000,.data:0x70100 –o filename.hex –xm filename.o
The –p option is used to tell where the section of the file starts in memory. In this case, .text
section will start in address 0x70000 and the .data section in 0x70100. The –o option is the name of
the output file. The –xm tells the program to generate a Motorola S3 record and filename.o is the
file of the object file.
Procedure:
1. Write an assembly language routine that multiplies 2 3x3 Matrices.
Remember:
a11
a12
... a1 j
a 21
a 22
... a2 j
..
...
...
...
a i1
ai 2
...
aij
↔
b11
b12
... b1 j
b21
b22
... b2 j
..
...
...
...
bi1
bi 2
...
bij
c11
=
c12
... c1 j
c 21 c 22 ... c 2 j
..
...
...
...
c i1
ci 2
...
cij
Cij = ai1 * b1 j + a i2 * b2 j + ... + ain * bnj , where n is the matrix dimension.
Hint:
Set the start of the matrices in memory, so you know where the code has to look for the data. Also,
make it flexible, so you can change the size of the matrices without making changes in your code.
37
References:
[1] Motorola, PowerPC Microprocessor Family: The Programming Environments for 32-Bit
Microprocessors, MPCFPE32B/AD, Rev 1, 1/97.
Suggested Code:
!file "matmult.s"
! Assembly Language program to multiply 2 3x3 matrices
! EMR 990321
!
! Register usage:
! r9 – Pointer to start of data section
! r12 – miscellaneous
! r5 - i
! r10 – j
! r11 – k
! r7 - Pointer to row in matrix
! r8 - Pointer to column in matrix
! r5 - holds temporary result of calculations
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! CODE Section
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
.org 0x60000
.text
.global mat_mult
mat_mult:
addic r5, r0, 0
!Clear R5
lis
addi
r9, 6
r9, r9, 208
!Load immediate shifted to R9
!Pointer to data section
lwz
r12, 108(r9)
!Load n
cmpw
bge
r5, r12
exit
!If R5>=R12 then ...
!goto exit, else ...
r10, r0, 0
!Clear R10
lis
addi
r9, 6
r9, r9, 208
!Load immediate shifted to R9
!Pointer to data section
lwz
r12, 108(r9)
!Load n
another_i:
addi
cmpwi r12, 0
ble
incr_i
another_j:
addi
lis
addi
!If R12<=0 then ...
!goto incr_i, else ...
r11, r0, 0
!Clear R11
r9, 6
r9, r9, 208
!Load immediate shifted to R9
!Pointer to data section
38
lwz
r12, 108(r9)
cmpwi r12, 0
ble
incr_j
another_k:
mulli r7, r5, 12
slwi r12, r11, 2
!Load n
!If R12<=0 then ...
!goto incr_j, else ...
!Pointer to row of A using i
!Pointer to col of A using k
add
r12, r12, r7
!Pointer to Aik
lis
addi
r9, 6
r9, r9, 208
!Load immediate shifted to R9
!Pointer to data section
lwzx
r6, r12, r9
!Load Aik to R6
mulli r12, r11, 12
slwi r8, r10, 2
!Pointer to row of B using k
!Pointer to col of B using j
add
r12, r8, r12
!Pointer to Bkj
add
lwz
r12, r12, r9
r12, 36(r12)
!Add start address of data section
!Load Bkj
mullw r6, r12, r6
!Aik*Bkj
add
add
lwz
add
r12, r8, r7
r8, r12, r9
r12, 72(r8)
r12, r12, r6
!R12=i+j
!Pointer Cij
!Load Cij
!Cij+=Aik*Bkj
stw
r12, 72(r8)
!Store Cij
addi
lwz
r11, r11, 1
r12, 108(r9)
!Increment k
!Load n
cmpw
bgt
r12, r11
another_k
!If R12>R11 then ...
!goto another_k, else ...
addi
r10, r10, 1
!Increment j
lis
addi
r9, 6
r9, r9, 208
!Load immediate shifted to R9
!Pointer to data section
lwz
r12, 108(R9)
!Load n
cmpw
bgt
r12, r10
another_j
!If R12>R10 then ...
!goto another_j, else ...
addi
r5, r5, 1
!Increment i
lis
addi
r9, 6
r9, r9, 208
!Load immediate shifted to R9
!Pointer to data section
incr_k:
incr_j:
incr_i:
39
lwz
r12, 108(R9)
!Load n
cmpw
blt
r5, r12
another_i
!If R5<R12 then ...
!goto another_i, else ...
exit:
blr
!exit
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!DATA section
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
.org 0x600d0
.data
matrix_a:
.word 1
.word 1
.word 1
.word 1
.word 1
.word 1
.word 1
.word 1
.word 1
matrix_b:
.word 2
.word 2
.word 2
.word 2
.word 2
.word 2
.word 2
.word 2
.word 2
matrix_c:
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
.word 0
n:
.word 3
Conclusions:
Students should be able to note that:
•
Programming in assembly is a bit complex. However, the increase in performance and the smaller
size of the resulting code makes it worth in some cases.
•
This routine alone is not very useful, but in the next laboratory we are going to show a way to
interface an assembly routine to a C/C++ program.
40
Troubleshooting:
If the student is not able to:
•
Get started. Suggest that the student code the multiplication in a C program until they have proven
their algorithm. If they are still having difficulty, the disassembly of the c program could provide
insight.
41
Experiment
8
Linking Assembly Language and C Code
Problem Statement:
•
This experiment introduces the student to linking PowerPC assembly language and C code.
(Contributed by Eisen Montalvo-Ruiz)
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
call an assembly routine from a C/C++ program
•
know the PowerPC function-calling sequence
Background Information:
The following information is excerpted directly from Chapters 10 and 11 of the “High C/C++
Programmer’s Guide for PowerPC”. This document can be obtained from Metware through their
website.
• Making an assembly routine callable from a C program
To be able to call an assembly language routine from a C program you must insert this piece of code
before the assembly routine:
.text
.align 2
.global name
name:
You are going to use “name” to call the routine from a C program.
42
•
Calling an assembly routine from C
For each assembly function you want to call, you have to declare it external. Then use the pragma
directive Alias for linking the internal name to the external name. The following code should make it
clearer:
extern foobar();
#pragma Alias(foobar,”name”);
...
void main()
{
...
foobar();
...
}
• Function-Calling Sequence
One of the most difficult parts of assembly language programming is parameter passing in function
calls. Fortunately, the PowerPC function-calling and parameter passing is among the easiest one in the
realm of assembly programming. Here goes a brief description of this process. If you want more information, read the books in the reference section.
Stack-Frame Layout
Figure 7.1 shows the memory stack frame organization for the PowerPC system. Every function needs
to establish their stack frame, but the stack frame is only necessary if the function is going to call another function.
43
Figure 7.1 Standard Stack Frame
High
Address
Back-Chain Word
Floating-point register save area
Stack grows
down
General register save area
C onditional register save area
FPSCR save area
Local variable space (padding allowed here only)
Stack frame
of the most
recently
called
function
Parameter list area
Link register save word
Stack
Pointer
Low
Address
Back-Chain Word
Stack
frame
header
The stack frame grows downward from high to low memory address, and is 16-byte aligned. It doesn’t
have a maximum size but it has a minimum. The minimum stack frame consists of the stack-frame
header, with padding to a 16-byte alignment. Any padding must occur within the local variable area.
The Stack pointer points to the Back-Chain word of the most recently called function. This forms a
linked list of stack frames.
The stack frame can include the following areas as required by any function:
Floating-point register save area – non-volatile floating-point registers modified
General register save area – non-volatile general registers modified
CR save area – condition register fields modified
FPSCR save area – floating-point status and control register bits modified
Local variable space – local variables of function not mapped to registers
44
Parameter list area – allocated by the caller of function; must be large enough to contain the arguments that the caller stores in it
LR save word – contents of the link register as they were at the time of entry to a function
Back-chain word – pointer to the previous stack frame’s back-chain word
The parameter list area is not preserved across function calls and it must follow the stack frame header
immediately.
Register usage
Table 7.1 contains the usage and status of the registers in the function calling process. Non-volatile
registers “belong” to the calling function. If the called function wants to use them, it must save their
values before using the registers and restore them before returning.
Volatile registers are not preserved across function calls, so you can use them without saving them.
Also you can’t use the dedicated and reserved registers. You can corrupt the system if you use them.
Table 7.1 PowerPC Register Usage
Register Name
r0
r1
r2
r3-r4
r5-r10
r11-r12
r13
r14-r30
r31
f0
f1
f2-f8
f9-f13
f14-f31
Status
Volatile
Dedicated
Dedicated
Volatile
Volatile
Volatile
Reserved
Non-Volatile
Non-Volatile
Volatile
Volatile
Volatile
Volatile
Non-Volatile
Usage
Language-specific purposes
Stack frame pointer, always valid
Reserved for system use
Parameter passing and return values
Parameter passing
Language-specific purposes
Small data area pointer
Local variables
Local variables or “environment pointer”
Language-specific purposes
Parameter passing and return values
Parameter passing
Scratch
Local variables
45
CR0
CR1
CR2
CR3
CR4
CR5
CR6
CR7
LR
CTR
XER
FPSCR0-23
FPSCR24-31
Volatile
Volatile
Non-Volatile
Non-Volatile
Non-Volatile
Volatile
Volatile
Volatile
Volatile
Volatile
Volatile
Volatile
Modifiable
Condition Register fields, each four bits wide
(Bit 6: Floating-point invalid operation exception)
Link Register
Count Register
Fixed-Point Exception Register
Floating-Point Status and Control Register
(Exception-enable and rounding-control bits)
Parameter passing
A maximum of eight integer arguments can be passed in general purpose registers r3 through r10 and a
maximum of eight floating-point arguments can be passed in f1 through f8. If the number of parameters
is less than the maximum, the unneeded registers contain undefined values. If the parameters passed do
not fit in those registers, the function must allocate a stack frame. It should allocate the minimum space
needed for the parameters that do not fit in the registers.
If the function wants to return a value, the table 7.2 shows how they can be passed, according to their
type.
Table 7.2 PowerPC Function Return Values
Function Return Type
float
double
int
long
enum
short
char
pointer to any type
long long
unsigned long
Return in Regis- Comment
ter
f1
r3
Returned as unsigned or signed integer (as appropriate), zero- or signed-extended to 32 bits if
necessary
r3 and r4
Returned with the lower-addressed word in r3 and
the higher-addressed word in r4
46
struct(less than or equal r3 and r4
to 8 bytes)
union(less than or equal
to 8 bytes)
long double
struct(greater
bytes)
than
8
Storage
Buffer
It is returned as if the following steps had occurred:
1- The struct or union was first stored in an
8-byte aligned memory area.
2- The low-addressed word was loaded into
r3
3- The high-addressed word was loaded into
r4
The address of this buffer is passed as a hidden
argument in r3
• Metaware Compiling
When you are combining assembly and C, you can’t compile like you did in the last laboratory. This is
an example of compiling C and assembly using hcppc, the Metaware C/C++ compiler.
hcppc -Hppc603 -Hldopt=-e,main -Hldopt=-B,start_addr=70000
-Hldopt=-x -Hldopt=-m c_code.c assembly_code.s
The option –Hppc603 tells the compiler to generate PowerPC 603 code. The –Hldopt are options
passed to the linker. The value after the equal sign is the option and the value after the comma is the
value of the option. For example –e tells the linker what function is the starting point in the code. In
this example, the starting point is the main function. The –B has a lot of values. One of the most useful is start_addr. It tells where the code starts in memory. In this example, the code starts at 0x70000.
The –m generates a map list file of the code. The standard output is the screen. You can use redirection to send the output to a file. And finally the –x tells the linker to generate Motorola S3 records,
ready to be downloaded to the Excimer Board. The last parameters are the filenames of the C and Assembly code. The output code will be named “a.hex”.
Procedure:
1. Write an assembly language routine that multiplies two N x N matrices and a C language program
that asks the user for the size of the matrices, their initial values and shows the resulting matrix.
The C program should call the assembly routine.
Hint: You can use the assembly routine you made in the last laboratory. If you followed the hint in
that laboratory, you shouldn’t need to make too many changes.
47
References:
[1] Motorola, PowerPC Microprocessor Family: The Programming Environments for 32-Bit
Microprocessors, MPCFPE32B/AD, Rev 1, 1/97.
Suggested Code:
/*
file MatrixMult.c
C program that calls an assembly routine. It asks the user for
the size and initial values for the matrices and then shows the results
of their multiplication.
EMR 990407
*/
#define scanf my_scanf /* Useful Functions */
#define getchar dink_get_char
#define putchar dink_write_char
#define printf dink_printf
void my_scanf(char *, int *);
/* Pointers to functions in Dink Memory */
unsigned long (*dink_get_char)() = (unsigned long (*)()) 0x1e4c4;
unsigned long (*dink_write_char)(char) = (unsigned long (*)(char)) 0x5eb4;
unsigned long (*dink_printf)() = (unsigned long (*)()) 0x6270;
/* Assembly Routine */
extern matrixmult(int size, int *result, int *mata, int *matb);
/* Alias(internal name, external name)
#pragma Alias (matrixmult,"matmult")
int
int
int
int
*mata; /* First Matrix */
*matb; /* Second Matrix */
*matc; /* Resultant Matrix */
size; /* Matrices Size */
int *malloc(unsigned int); /* Memory Allocation function proto*/
void main()
{
int i, l;
int temp;
/* Ask the user for the size */
printf("Enter size of matrices > ");
scanf("%d",&size);
printf("\n");
/* Separate memory for the matrices */
mata = malloc(size*size);
matb = malloc(size*size);
matc = malloc(size*size);
48
/* Ask user for initial values */
for(int j=0; j<size; j++)
{
for(int m=0; m<size; m++)
{
printf("A%d",j+1);
printf("%d = ",m+1);
scanf("%d", &temp);
printf("\n");
mata[j*size+m]=temp;
printf("B%d",j+1);
printf("%d = ",m+1);
scanf("%d", &temp);
printf("\n");
matb[j*size+m]=temp;
/* Clear resultant matrix memory */
matc[j*size+m]=0;
}
}
/* Calling assembly routine */
matrixmult(size, matc, mata, matb);
/* Display results */
for(i=0; i<size; i++)
{
printf("| ");
for(l=0; l<size; l++)
printf("%d ",mata[i*size+l]);
printf("| ");
if((i+1)==(size/2))
{
printf("*");
}
else
{
printf(" ");
}
printf(" | ");
for(l=0; l<size; l++)
printf("%d ",matb[i*size+l]);
printf("| ");
if((i+1)==(size/2))
{
printf("=");
}
else
{
printf(" ");
}
printf(" | ");
for(l=0; l<size; l++)
printf("%d ",matc[i*size+l]);
printf("|");
printf("\n");
49
}
}
/* User Input Function */
/* By Chuck Corley
*/
void my_scanf(char *fmt, int *v)
{
char ch;
int
no_runs = 0;
while ((ch = getchar()) != 0xd)
/* Carriage return? */
{
if ( (ch == 0x7f) || (ch == 0x8)) /* Delete? */
{
putchar(0x8);
/* Backspace */
putchar(0x20);
/* Overprint a space. */
putchar(0x8);
/* Backspace */
/* Assume modulo arithmetic to subtract last digit added. */
no_runs = no_runs / 10;
} else
if ( (ch >= '0') && (ch <= '9')) /* A digit? */
{
putchar(ch);
/* Echo it and */
/* Accumulate the value. */
no_runs = (no_runs * 10) + (ch - 48);
/* ASCII character - 48 equals the digit. */
}
}
*v = no_runs;
/* Assign second Arg the value. */
}
/* Memory Allocation Function */
/* By Chuck Corley
*/
int *malloc(unsigned int size)
{
static int buffer[2048];
static int *next = buffer;
int *p = next;
next += ((size + 7) & ~7);
if (next >= buffer + sizeof(buffer))
/* Terminate by executing a zero. */
asm(".long 0");
return p;
}
!file "matmult.s"
! Assembly Language program to multiply 2 3x3 matrices
! EMR 990407
!
! Parameters:
!
r3 = size
!
r4 = pointer to matrix c
!
r5 = pointer to matrix a
!
r6 = pointer to matrix b
!
! Register usage:
!
r14 = i
!
r15 = j
50
!
!
!
!
!
r16
r17
r18
r19
r20
=
=
=
=
=
k
temp
offset to current value of cell in matrix a
offset to current value of cell in matrix b
offset to current value of cell in matrix c
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! CODE Section
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
.org 0x60000
.text
.align 2
.global matmult
matmult:
xor
r14, r14, r14
!Clear R14
xor
r17, r17, r17
!Clear R23
cmpw
bge
r14, r3
exit
!If i>=size then ...
!goto exit, else ...
another_i:
xor
r15, r15, r15
!Clear j
another_j:
xor
r16, r16, r16
!Clear k
another_k:
mullw
add
slwi
lwzx
r18,
r18,
r18,
r18,
r14, r3
r18, r16
r18, 2
r5, r18
!Offset to row of A using i
!Offset to col of A using k
!Multiply by 4, we’re loading words, not bytes
!Load Aik to R21
mullw
add
slwi
lwzx
r19,
r19,
r19,
r19,
r16, r3
r19, r15
r19, 2
r6, r19
!Offset to row of B using k
!Offset to col of B using j
!Multiply by 4, we’re loading words, not bytes
!Load Bkj to R22
mullw r17, r21, r22
add
r23, r23, r17
incr_k:
addi
cmpw
blt
!Aik*Bkj
!Cij+=Aik*Bkj
r16, r16, 1
!Increment k
r16, r3
another_k
!If k<size then ...
!goto another_k, else ...
save_val:
mullw r20, r14, r3
add
r20, r20, r15
slwi r20, r20, 2
stwx r17, r4, r20
xor r17, r17, r17
!Offset to row of C using i
!Offset to col of C using j
!Multiply by 4, we’re using words, not bytes
!Store Cij
!Clear Temp for another cell
incr_j:
addi
r15, r15, 1
!Increment j
r15, r3
!If j<size then ...
cmpw
51
blt
incr_i:
addi
cmpw
blt
another_j
!goto another_j, else ...
r14, r14, 1
!Increment i
r14, r3
another_i
!If i<size then ...
!goto another_i, else ...
exit:
blr
!exit
Conclusions:
Students should be able to note that:
•
Linking assembly and C routines is very important in those cases where the complexity of the
problem at hand requires a high level language but where the performance of certain routines within
the problem is crucial.
Troubleshooting:
If the student is not able to:
•
Get the parameters in the assembly function: use the list file to know where the assembly function
and the parameters are in memory. Then look in the assembly code of the C program for the call to
the assembly function. Before the call you will see where the code is putting the parameters in the
registers for the assembly function. This could help you understand the function-calling sequence.
Experiment
9
52
Converting Integers to Floating Point
Problem Statement:
•
This experiment requires the development of an assembly language subroutine to convert the 64 bit
integer value read from the PowerPC time base facility to a 64 bit (double) floating point number
representing seconds. (Contributed by Chuck Corley, Motorola)
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
write and assemble an assembly language subroutine
•
call the assembly language subroutine from a C program and use the values returned
•
convert integer numbers to PowerPC floating point representation
•
convert time base count values to seconds of wall clock time
Background Information:
The PowerPC architecture requires each microprocessor implementation to provide a time base facility
(TB), a 64-bit structure that consists of two 32-bit registers – time base upper (TBU) and time base
lower (TBL). User level applications are permitted read-only access to the TB which is useful for
timing program execution or providing a time reference. The update frequency of the time base is system-dependent so the algorithm for converting the current value in the time base to time of day is also
system-dependent. The MPC603e microprocessor used on the Excimer board increments the TB at
one-fourth the SYSCLK (bus) frequency.
Excimer does not have a real time clock chip as would be found on most computers. TBU and TBL are
cleared at each power-up (or they can be set to an initial value in supervisor mode). The TB facility
53
then counts up at one-fourth of SYSCLK frequency from this initial value. Excimer cannot relate a TB
value to real time without user assistance – like setting a watch.
SYSCLK is crystal controlled to 66.6666MHz (see the oscillator on the board at U15), therefore TBL
increments 16,666,667 times per second. When TBL exceeds 232, a carry-out bit increments TBU.
Thus, TBU will increment every 257.7 seconds and the total range of the TB is 1.1x1012 seconds or
approximately 35,000 years. This number is better represented in application programs as a floating
point value.
The PowerPC architecture represents double precision floating point values in the 64-bit format shown
in Figure 1.
S
0 1
63
EXP
FRACTION
11 12
Figure 1. Floating-Point Double-Precision Format
Where
•
•
•
S (sign bit)
EXP (exponent + bias)
FRACTION (fraction)
For numeric values, the significand consists of a leading implied bit concatenated on the right with the
FRACTION. For normalized numbers (it is unnecessary to deal with denormalized floating point
numbers in this excercise) the implied bit is a one and is the first bit to the left of the binary point.
Normalized numbers are interpreted as follows:
NORM = (-1)S x 2(EXP - 1023) x (1.FRACTION)
The range covered by the magnitude (M) of a normalized double-precision floating-point number is approximately:
2.2x10-308 ≤ M ≤ 1.8 x 10308
The double precision exponent is biased by adding 1023 so that positive and negative exponents can be
represented without a sign bit for the exponent. Example exponents are shown in Table 1.
54
Biased Exponent
(Binary)
111_1111_1111
Unbiased Exponent
(Double-Precision)
Reserved for infinities and NaNs
111_1111_1110
+1023
111_1111_1101
+1022
.
.
100_0000_0000
1
011_1111_1111
0
111_1111_1110
-1
.
.
000_0000_0001
-1022
000_0000_0000
Reserved for zeros and denormalized numbers
Table 1. Biased Exponent Format
Examples of TB integer values converted to double precision floating point representation are shown in
Table 2. Since it would take 35,000 years to test a conversion program for the upper limits of the TB,
this experiment should include a test program that supplies these example values to an integer-tofloating point assembly-language conversion routine and verifies that the correct floating point value is
returned. The last column of Table 2 shows the floating point value of the TB converted to seconds
given Excimer’s 66MHz bus clock.
EXP
biased
TB count
(decimal)
TBU (hex)
TBL (hex)
S
(dec)
Seconds
FRACTION (hex)
DP Floating Pt Value (hex)
(dec)
0
0000_0000
0000_0000
0
0
0_0000_0000_0000
0000_0000_0000_0000
0.00
1
0000_0000
0000_0001
0
+1023
0_0000_0000_0000
3FF0_0000_0000_0000
6.00e-8
2
0000_0000
0000_0002
0
+1024
0_0000_0000_0000
4000_0000_0000_0000
1.20e-7
524,288
0000_0000
0008_0000
0
+1042
0_0000_0000_0000
4120_0000_0000_0000
3.15e-2
1.57e6
0000_0000
0018_0000
0
+1043
8_0000_0000_0000
4138_0000_0000_0000
9.44e-2
3.67e6
0000_0000
0038_0001
0
+1044
C_0000_8000_0000
414C_0000_8000_0000
2.20e-1
55
1.67e7
0000_0000
00FE_502B
0
+1046
F_CA05_6000_0000
416F_CA05_6000_0000
1.00
3.22e9
0000_0000
C000_0401
0
+1054
8_0000_8020_0000
41E8_0000_8020_0000
1.93e2
7.73e2
1.29e10
0000_0003
4000_5001
0
+1056
A_0002_8008_0000
4208_0002_8008_0000
1.58e16
0038_0001
4001_0005
0
+1076
C_0000_A000_8002
434C_0000_A000_8002
9.46e8
1.84e19
FEDC_BA98
7654_3210
0
+1086
F_DB97_530E_CA86
43EF_DB97_530E_CA86
1.10e12
Table 2. Example TB to Floating Point Conversions
Procedure:
1. Write an assembly language routine which accepts two unsigned integer arguments TBU and TBL
and returns a double float value.
Suggestion: Assembly language routines are used primarily for speed (or access to hardware resources
that are otherwise not available). To make this routine faster, try using static branch prediction.
For example, a TB value of zero has to be tested as a special case to form EXP but is unlikely.
Likewise, values over 252 are unlikely (why would there be a conditional branch for this value?)
Hint: You may find the assembly language instructions cntlzw and rlwnm very useful.
2.
Write a C program which calls the assembly language routine with the example values of Table 2
and check that it returns the correct floating point value.
Reminder: DINK Version 10.5 provides a dink_printf routine that may be used to print results to
the terminal. However, it will not format floating point numbers; results will have to be displayed
as two unsigned long int values. Is there a C construct which will permit viewing two 32bit memory locations as both unsigned long int and double?
3. Write an assembly language routine that reads Excimer’s TB facility and, using Excimer’s bus clock
speed of 66.6666Mhz, returns seconds as a double-precision floating point number.
Caution: TB must be read in two separate instructions. It is unlikely, but possible, that TBU could
increment between reading these two registers. Consider the sequence TBU = 0x0000_0000, TBL
= 0xFFFF_FFFF; TBU = 0x0000_0001, TBL = 0x0000_0000. What would be the error if your assembly language routine got the first value of TBU and the second value of TBL? Would reading
the registers in reverse order avoid this problem?
Suggestions: This assembly language routine may be useful in other programs. Saving it in a standalone file “timer.s” and then linking it with this or other C programs will make it more useful. A
header file, e.g. “Excimer.h,” might be a convenient place to define constants like
EXCIMER_BUS_SPEED that could change on other PowerPC systems.
56
4. Write a C program which outputs a zero to twenty second count to the terminal emulator and time
it with a stopwatch. (Using dink_printf to display seconds as integer decimal numbers is acceptable).
References:
[1] Motorola, PowerPC Microprocessor Family: The Programming Environments for 32-Bit Microprocessors, MPCFPE32B/AD, Rev 1, 1/97.
Suggested Code:
/* file "Excimer.h" */
/* Header file for Excimer-unique constants */
/* Excimer oscillator (U15 on PWB) runs at 66.6666MHz */
#define BUS_FREQUENCY 66666667
/* TB ticks/sec at Excimer bus clock. */
double TICS_PER_SEC = BUS_FREQUENCY/4;
/* Bus frequency as an integer in MHz. */
int
IBUS_MHz = BUS_FREQUENCY/1000000;
/*
*
*
*
*
*
*
*
*/
Excimer does not support <stdio.h>. DINK has it's own print routine
called dink_print which supports a limited number of format types (decimal-%d,
hex-%x, string-%s, etc). Redefining printf to point to the dink_print
function, enables standard C programs downloaded to Excimer to print to the
terminal. The address of dink_print is available from the symbol table command
(symtab) in DINK. If the version of DINK on Excimer is updated from the Motorola
website at http:\\www.mot.com\PowerPC\teksupport the address of dink_print must
be updated here.
#define printf dink_print
unsigned long (*dink_print)() = (unsigned long (*) ()) 0x6368;
/* file "Exercise.h"
* Header file for common typedefs for Exercise? Chuck Corley
*/
981218
struct TB_View { unsigned long TBU_View;
unsigned long TBL_View;
};
union
DPFP_View { struct TB_View TB_FPasGPR_View;
double TB_FP_View;
};
struct Test_struct
union
{
struct TB_View TB_GPR_View;
DPFP_View
TB_FP_test;
57
};
!file "dtime.s" (For Metware High C/C++ Compiler/Assembler)
! Assembly language routine to convert 64-bit PowerPC TB facility to
! Double-precision, floating-point number. (Plus additional routines for
! testing.)
CJC 981216
! Register usage:
!
r3 = FPU
(upper 32 bits of floating point value)
!
r4 = FPL
(lower 32 bits of floating point value)
!
r5 = TBU(time base upper - read from spr or loaded for test)
!
r6 = TBL(time base lower - read from spr or loaded for test)
!
r7 = leading zeroes in a register or shift count of +/-(zeroes - 11)
!
r8 = accumulator for final EXPonent value of DPFP number
!
r9 = shift count of 32 - n where n = +/-(zeroes -11)
!
r10 = constant register of 11
!
r11 = link register storage
#define TBU 269;
!Special purpose register numbers for TB
#define TBL 268;
.data
Local_storage:
.double 0
.text
.global dtime
.global get_HID1
.global conversion_test
!For CodeWarrior:
!asm double conversion(double TICS)
conversion:
cntlzw r7,r5
!Find leading zeroes in TBU. Preserve in r7.
addi
r9,r0,32
!Will need a 32 in several places. Create one in r9.
addi
r10,r0,11
!Create a constant in r10 = 11.
subf. r8,r7,r9
!r8 will hold EXP. Currently (32 - leading zeroes)
beq+
tbu_is_zero
!TBU never got incremented? (Zeroes=32?) (Most likely)
subf.
add
bge+
r7,r10,r7
r8,r8,r9
tbu_lt_8yrs
!No. Is TB more than 2^^52? (Zeroes<11?) r7 = (Z-11)
!Final exponent will be (64 - 1 - leading zeroes).
!If TB>2^^52, shift TBU bits right. (Not likely)
tbu_gt_8yrs:
neg
subf
rlwnm
rlwnm
rlwnm
or
b
!for Z<11: fpu = tbu>>n=(11-Z);
!fpl = tbu<<n=(32-(11- Z))|tbl>>n=(11-Z);
r7,r7
!rlwnm shift count of (11-Z) = -(Z-11) = n = r7.
r9,r7,r9
!rlwnm shift count of 32-n = 32 - (11-Z) = r9.
r3,r5,r9,12,31 !Shift TBU right n = (11 - Z). Mask off [0:11].
r4,r5,r9,0,10 !Shift remaining TBU bits left n = 32-(11-Z)
r6,r6,r9,0,31 !Shift TBL right n = (11 - Z)
r4,r6,r4
!Or rest of TBU shifted left with TBL shifted right.
form_exponent !Go bias the exponent and or into FPU.
tbu_lt_8yrs:
subf
rlwnm
srw
or
rlwnm
xor
b
!for Z>=11: fpu=tbu<<n=(Z-11)|tbl>>n=(32-(Z-11));
!fpl=tbl<<n=(Z- 11);
r9,r7,r9
!Form a shift count of 32 - (11-Z) = r9.
r3,r5,r7,12,31 !Shift TBU left n = (Z-11). Mask off [0:11].
r5,r6,r9
!Shift TBL bits right n = 32-(Z-11).
r3,r3,r5
!Or TBU shifted left with TBL shifted right.
r6,r6,r7,0,31 !Shift remainder of TBL left n = (Z-11).
r4,r6,r5
!XORing with the same value shifted right is like ANDing
form_exponent !fpl with a mask of all zeroes in bits [32-(Z-11):31].
tbu_is_zero:
cntlzw r7,r6
subf. r8,r7,r9
!Z= 32
!Find leading zeroes in TBL.
!EXP = (32 - leading zeroes).
58
beqsubf.
bge-
tbl_is_zero
r7,r10,r7
tbl_lt_63ms
!Entire TBL count exactly zero? (Not likely)
!No. Is TB less than 2^^20? (zeroes < 11?)
!If not, will have to shift bits right. (Most likely)
tbl_gt_63ms:
!for z<11: fpu = tbl>>n=(11-z); fpl = tbl<<n=(32-(11-z));
neg
r7,r7
!rlwnm shift count of (11-Z) = -(Z-11) = n = r7.
subf
r9,r7,r9
!rlwnm shift count of 32-n = 32 - (11-Z) = r9.
rlwnm r3,r6,r9,12,31 !Shift TBL right n = (11 - z). Mask off [0:11].
rlwnm r4,r6,r9,0,10 !Shift remaining TBL bits left n = 32 - (11 - Z).
b
form_exponent
tbl_lt_63ms:
!for z>=11: fpu = tbl<<(z-11); fpl = 0;
rlwnm r3,r6,r7,12,31 !Shift TBL left n = (Z-11). Mask off bits 0-11.
xor
r4,r4,r4
!fpl = 0.
b
form_exponent
tbl_is_zero:
xor
r3,r3,r3
xor
r4,r4,r4
b compute_seconds
!for Z=32 && z=32: fpu = fpl = 0;
!Unlikely result that TB was zero. Prepare to
!return all zeroes for the floating point value.
form_exponent:
addi
r8,r8,1022
!Add DP bias (1023) -1 to the exponent
rlwinm r8,r8,20,1,12 !Biased DP EXP will be (63-(leading zeroes in TB)+1023).
or
r3,r3,r8
compute_seconds:
lis
r5, Local_storage@h
ori
r5, r5, Local_storage@l
stw
r3, 0(r5)
stw
r4, 4(r5)
lfd
f2, 0(r5)
!Load back in as 64bit float
fdiv
f1,f2,f1
!Divide by bus clock ticks per second
blr
!Return time in seconds as double in fp1
! Routine passed sample values of TBU and TBL. Returns FPU and FPL as
! unsigned long.
!For CodeWarrior:
!asm struct TB_View * conversion_test(unsigned long Upper,
!
unsigned long Lower, double TICS)
conversion_test:
or
r5,r3,r3
!Use test values of TBU and TBL passed in r3 and r4
or
r6,r4,r4
!as substitutes for values read from TB.
mflr
r11
!Save the return address.
bl
conversion !Convert TBU and TBL into FPU and FPL
mtlr
r11
!Return in r3 and r4
!For CodeWarrior:
! la
r3,Local_pointer(SP)!Return a pointer to the FPU storage location.
blr
! Routine passed sample values of TBU and TBL. Returns seconds as double.
!For CodeWarrior:
!asm double float_test(unsigned long Upper, unsigned long Lower, double TICS)
float_test:
or
or
mflr
bl
mtlr
blr
r5,r3,r3
r6,r4,r4
r11
r11
!Use test values of TBU and TBL passed in r3 and r4
!as substitutes for values read from TB.
!Save the return address.
conversion !Convert TBU and TBL into FPU and FPL
!Return as double in fpr1
! Routine reads the TBU and TBL. Returns seconds as double.
!For CodeWarrior:
59
!asm double dtime(double
dtime:
read_TB:
mfspr r5,TBU
mfspr r6,TBL
mfspr r7,TBU
subf. r7,r5,r7
bgtread_TB
mflr
r11
bl
conversion
mtlr
r11
blr
TICS)
!Get TBU.
!Get TBL.
!Get TBU again.
!Did it increment between reading TBU and TBL?
!If so, read them again. (Not likely)
!Save the return address.
!Convert TBU and TBL into FPU and FPL
! Routine reads the HID1 (PLL_CFG) register. Returns in r3.
get_HID1:
mfspr r3,1009
!Get HID1 register.
blr
/* file "test_program.c"
* Tests the operation of assy language routine to convert PowerPC TimeBase
* values from integer to DP floating point values.
Chuck Corley 981214
*/
#include <stdlib.h>
#include "Excimer.h" /* File of Excimer board-specific constants */
#include "Exercise.h" /* File of common typedefs for this exercise */
struct TB_View conversion_test(int, int, double); /* Given bus freq, returns time in seconds. */
void main(void)
{
int i, MAX_EXAMPLES;
struct Test_struct
Example[] =
{
/* Consider - Case1: Z<11; Case2: Z>=11; Case3: z<11; Case4: z>=11; Case5: Z=z=32; */
/* Case5: All leading zeroes. */
{
0x00000000, 0x00000000, 0x00000000, 0x00000000},
/* Case4: Single one to treat as implied bit. Move from TB[32+31] to DPFP[11]. */
{
0x00000000, 0x00000001, 0x3FF00000, 0x00000000},
/* Case4: Single one to treat as implied bit. Move from TB[32+30] to DPFP[11]. */
{
0x00000000, 0x00000002, 0x40000000, 0x00000000},
/* Case4: Single one to treat as implied bit. Move from TB[32+12] to DPFP[11]. */
{
0x00000000, 0x00080000, 0x41200000, 0x00000000},
/* Case4: FRACTION starting in TB[32+12]. Move to DPFP[12:31]. */
{
0x00000000, 0x00180000, 0x41380000, 0x00000000},
/* Case3: FRACTION starting in TB[32+11]. Move to DPFP[12:32]. Check DPFP[32]=1. */
{
0x00000000, 0x00380001, 0x414C0000, 0x80000000},
/* Case3: FRACTION (One sec) starts TB[32+9]. Move to DPFP[12:34]. DPFP[32:34]=6? */
{
0x00000000, 0x00FE502B, 0x416FCA05, 0x60000000},
/* Case3: FRACTION in TB[33:63]. FPU[12:31]=TBL[1:20]. FPL[0:10]=TBL[21:31]. */
{
0x00000000, 0xC0000401, 0x41E80000, 0x80200000},
/* Case2: FRACTION-TB[31:63]. FPU[12]=TBU[31]. FPU[13:31]=TBL[0:18]. FPL[0:12]=TBL[19:31].*/
{
0x00000003, 0x40005001, 0x420A0002, 0x80080000},
/* Case1: FRACTION-TB[11:63]. FPU[12:31]=TBU[11:30]. FPU[0]=TBU[31]. FPL[1:31]=TBL[0:30].*/
{
0x00380001, 0x40010005, 0x434C0000, 0xA0008002},
/* Case1: TB[1:63]. FPU[12:31]=TBU[1:20]. FPL[0:10]=TBU[21:31]. FPL[11:31]=TBL[0:20].*/
{
0xFEDCBA98, 0x76543210, 0x43EFDB97, 0x530ECA86},
};
struct Test_struct Result;
MAX_EXAMPLES = sizeof(Example) / sizeof(Example[0]);
for (i=0; i< MAX_EXAMPLES; i++)
{
/* These printf formats are for the restrictive dink_print routine. */
printf("TBU= %x ", Example[i].TB_GPR_View.TBU_View);
printf("TBL= %x ", Example[i].TB_GPR_View.TBL_View);
60
Result.TB_FP_test.TB_FPasGPR_View = conversion_test (Example[i].TB_GPR_View.TBU_View, \
Example[i].TB_GPR_View.TBL_View, TICS_PER_SEC);
if ((Result.TB_FP_test.TB_FPasGPR_View.TBU_View != \
Example[i].TB_FP_test.TB_FPasGPR_View.TBU_View) || \
(Result.TB_FP_test.TB_FPasGPR_View.TBL_View != \
Example[i].TB_FP_test.TB_FPasGPR_View.TBL_View)) \
printf(" ERROR!\n");
printf("FPU=
printf("FPL=
%x ", Result.TB_FP_test.TB_FPasGPR_View.TBU_View);
%x\n", Result.TB_FP_test.TB_FPasGPR_View.TBL_View);
/* This is not useful on Excimer because we can't print the floating point result.
check in CodeWarrior on the Mac. CJC*/
/*
Result.TB_FP_test.TB_FP_View = float_test(Example[i].TB_GPR_View.TBU_View,
Example[i].TB_GPR_View.TBL_View, TICS_PER_SEC);
It is a useful
printf("FPR = %4.2e \n", Result.TB_FP_test.TB_FP_View);
*/
};
return;
}
/* file "watch.c"
* Reads the PowerPC Time Base Facility on Excimer and prints out a twenty
* second count to the terminal emulator.
Chuck Corley 981214
*/
#include <stdlib.h>
#include "Excimer.h" /* File of Excimer board-specific constants */
#include "Exercise.h" /* File of common typedefs */
double dtime(double); /* Given bus freq, returns time in seconds. */
unsigned long get_HID1(); /* Returns HID1 register. */
void main(void)
{
double begin_time, current_time, delta_time = 0.0, seconds = 0.0;
int int_seconds;
unsigned long HID1_Reg;
printf("PowerPC Timer Test.\n");
printf("Beginning a twenty second count assuming bus speed of 66.67MHz.\n");
printf("Please time me.\n");
printf("If your stopwatch time differs significantly from 20 seconds, \n");
printf("we can compute the actual bus speed.\n");
begin_time = dtime(TICS_PER_SEC);
for (int_seconds = -1; int_seconds <= 20; int_seconds++) /*Countup to start.*/
{
while (delta_time < 1.0)
{
current_time = dtime(TICS_PER_SEC);
delta_time = current_time - (int_seconds) - begin_time;
}
delta_time = 0.0;
switch (int_seconds)
{
case -1 :
break;
/* Delay to get stopwatch ready. */
case 0 :
printf("Start now!\n"); /* Begin timing at zero seconds. */
break;
case 1:
printf("%d second\n", int_seconds);
break;
61
default:
printf("%d seconds\n", int_seconds);
/* End of int_seconds switch */
}
};
printf("If your time was not 20 seconds,\n");
printf("bus speed is (20 / your_time) * 66.67MHz.\n");
/* Bonus Exercise. Given the bus speed, calculate the processor (core) speed.*/
HID1_Reg = get_HID1() >> 28;
/* Move HID1[0:3] to [28:31] */
printf("HID1 indicates PLL_CFG=%x.\n", HID1_Reg );
printf("If bus=%dMHz, ", IBUS_MHz);
switch (HID1_Reg)
{
case 0x4: /* PLL_CFG = 0b0100 */
printf("Core Freq(2x)=%dMHz & ", 2*IBUS_MHz);
printf("VCO Freq(2x)=%dMHz\n", 2*IBUS_MHz);
break;
case 0x5: /* PLL_CFG = 0b0101 */
printf("Core Freq(2x)=%dMHz & ", 2*IBUS_MHz);
printf("VCO Freq(4x)=%dMHz\n", 4*IBUS_MHz);
break;
case 0x6: /* PLL_CFG = 0b0110 */
printf("Core Freq(2.5x)=%dMHz & ", (int)(2.5*(float)IBUS_MHz));
printf("VCO Freq(2x)=%dMHz\n", 5*IBUS_MHz);
break;
case 0x8: /* PLL_CFG = 0b1000 */
printf("Core Freq(3x)=%dMHz & ", 3*IBUS_MHz);
printf("VCO Freq(2x)=%dMHz\n", 6*IBUS_MHz);
break;
case 0xe: /* PLL_CFG = 0b1110 */
printf("Core Freq(3.5x)=%dMHz & ",(int)(3.5*(float)IBUS_MHz));
printf("VCO Freq(2x)=%dMHz\n", 7*IBUS_MHz);
break;
case 0xa: /* PLL_CFG = 0b1010 */
printf("Core Freq(4x)=%dMHz & ", 4*IBUS_MHz);
printf("VCO Freq(2x)=%dMHz\n", 8*IBUS_MHz);
break;
case 0x7: /* PLL_CFG = 0b0111 */
printf("Core Freq(4.5x)=%dMHz & ",(int)(4.5*(float)IBUS_MHz));
printf("VCO Freq(2x)=%dMHz\n", 9*IBUS_MHz);
break;
case 0xb: /* PLL_CFG = 0b1011 */
printf("Core Freq(5x)=%dMHz & ", 5*IBUS_MHz);
printf("VCO Freq(2x)=%dMHz\n", 10*IBUS_MHz);
break;
case 0x9: /* PLL_CFG = 0b1001 */
printf("Core Freq(5.5x)=%dMHz & ",(int)(5.5*(float)IBUS_MHz));
printf("VCO Freq(2x)=%dMHz\n", 11*IBUS_MHz);
break;
case 0xd: /* PLL_CFG = 0b1101 */
printf("Core Freq(6x)=%dMHz & ", 6*IBUS_MHz);
printf("VCO Freq(2x)=%dMHz\n", 12*IBUS_MHz);
break;
case 0x3: /* PLL_CFG = 0b0011 */
printf("PLL in bypass!\n");
break;
case 0xf: /* PLL_CFG = 0b0011 */
printf("CLOCK OFF! How can this be???\n");
break;
default:
printf("ERROR - INVALID PLL_CFG!");
} /* End of HID1 switch */
return;
}
62
Conclusions:
Students should be able to note that:
•
A 64-bit rotate instruction would be very useful but has to be synthesized in the 32-bit PowerPC
architecture.
•
The PowerPC Embedded Application Binary Interface (EABI) specifies how arguments are passed
to an assembly language routine and values are returned.
•
The syntax for assembly language programs varies widely among compiler/assembler vendors.
•
The only way to pass data between the integer registers (GPRs) and the floating point registers
(FPRs) on PowerPC is by writing and reading to memory.
•
With wise use of the register set, the memory access to pass information from the GPRs to the
FPRs is the only memory access the assembly language routine needs - thus improving performance.
Troubleshooting:
If the student is not able to:
•
Get started. Suggest that the student code the conversion in a c program until they have proven
their algorithm. If they are still having difficulty, the disassembly of the c program could provide
insight.
•
Get the desired returned values from function calls. This is a good opportunity to use breakpoints
and examine the registers to determine how the expected value is being returned.
63
Experiment
10
Dhrystone Benchmarking
Problem Statement:
•
In this experiment the student will adapt the popular Dhrystone benchmark to execute on the Excimer board.
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
verify a popular industry metric of processor performance in embedded applications, Dhrystone
Version 2.1 Vax MIPs, for the PowerPC 603e microprocessor on the Excimer board.
•
compare processor performance, as measured by Dhrystone, to published values for other processors.
•
compare code generation and instruction scheduling, and resulting performance, for several competing compilers on the Dhrystone benchmark.
•
substitute more highly optimized routines for the built-in or library functions provided by compiler
vendors to improve performance.
•
utilize the dtime function of Experiment 4 to measure elapsed time for a benchmark’s execution.
Background Information:
Compararative performance of computers is a popular topic for computer scientists, computer architects, and computer salesmen. Many performance measurements, or benchmarks, have been used over
64
the last several decades to compare various aspects of computer performance. Some benchmarks involve running real applications, e.g. compiling the compiler or calculating a spreadsheet, which are
heavily dependent on the resources of a particular operating system.
Others are small synthetic
benchmarks designed to be representative of the workload of a class of larger applications but which do
no meaningful work and are easier to run across various operating systems and architectures.
The Dhrystone benchmark is a synthetic benchmark developed by Reinhold P. Weicker of SiemensNixdorf in the early eighties. It was first published in "Communications of the ACM" vol. 27., no. 10
(Oct. 1984), pp. 1013 - 1030. It is easily ported to many different operating environments and results
for many computers are widely published. For embedded processors, where operating system and
system resources may be limited, it has been the most often quoted performance measure. It is popular
because it provides one number – Vax MIPS – that can be compared quickly with other computers.
(Vax MIPS are calculated by dividing the number of times that the Dhrystone benchmark completes in
a second by the number of Dhrystones per second performed by the now-ancient Vax 11/780 from
Digital Equipment Corporation.) On the other hand, it is widely disparaged because it is so small that
it fits entirely within the first level cache of most modern microprocessors and compiler vendors soon
made a game out of optimizing it to get ever-higher Vax MIPs numbers.
Motorola publishes Dhrystone 2.1 Vax MIPs numbers for the PowerPC 603e processor on the Excimer board because the number is often requested. After the first loop through the benchmark, it resides
entirely in the L1 cache of any PowerPC microprocessor. At that point the performance varies linearly
with frequency and the results reflect the efficiency of the micro-architecture and the effectiveness of
the compiler in generating code to capitalize on it. Motorola’s published numbers are 1.41 Vax MIPs
per Mhz. For an Excimer board running at 133Mhz (to keep it comfortably cool in a still air environment), that equates to 188 Vax MIPs.
The Dhrystone benchmark (and numerous others) is available in it’s official source code via anonymous
ftp to 'ftp.nosc.mil' in directory 'pub/aburto'. The IP address is: 128.49.192.51. Instructions for exe-
65
cuting the benchmark and “rules” for execution are available there as well. Comparative results for
many computers are available from the same site or from various news groups including
'comp.benchmarks'.
Procedure:
1. Download the Dhrystone Version 2.1 benchmark from the ftp site. Read the associated instructions for compilation and execution. You will find that the benchmark calls the C library functions
strcpy and strcmp inside the measurement loop and printf and scanf outside the measurement loop. You will also find that the benchmark calls a timer function dtime() that returns a count
of seconds as a double floating point number. You will need to assemble and link the assembly language code from Experiment 4 which reads the PowerPC time base facility and converts the 64 bit
integer value to time in seconds based on Excimer’s 66Mhz bus speed.
Reminder: Function calls such as printf will have to be equated to dink_printf routine to print
results to the terminal. Dhrystone also queries the user via the scanf function for the number of
times to run the benchmark. A scanf function using DINK’s getchar and writechar will have
to be written and substituted or the number of times through the benchmark hard coded. If hardcoding the number of runs, be certain to use a variable instead of a constant, as a constant would
change the benchmark inside the measurement loop. Motorola makes no changes to the benchmark
which would unfairly change the result when compared to other results.
2. Compile the Dhrystone source code files, link in the dtime() function, and execute the benchmark
on the Excimer board. Compare your results to Motorola’s published numbers.
Hint: Maximum performance will be obtained only when running entirely out of cache. If the SRAM
access LED on Excimer is not out during execution then the program is not running entirely from
the internal cache. DINK’s regmod command may be needed to enable the instruction and data
cache (regmod HID0 to new value of 8000c000). A bug in early versions of DINK may also require
modifying the data memory mapping unit (DMMU) to make the data accesses cacheable (regmod
rbat1l to new value of 00000012).
3. Examine the disassembled code. The strcmp library function offers one opportunity for performance enhancement. Many C libraries compare strings one byte at a time. Motorola provides a library
of
highly
optimized
functions
including
strcmp
on
their
website
at
66
http://www.mot.com/PowerPC/teksupport. The assembly language for strcmp from this library is
shown below. Notice that when possible this function compares strings four bytes – a word – at a
time, thus reducing memory (or in this case, cache) accesses by 75%. Assemble and link this strcmp
function in place of the stdlib function. Did performance improve?
#-----------------------------------------------------------------# Copyright, Motorola, Inc. All Rights Reserved. This
# software contains proprietary and confidential information of
# Motorola, Inc. Use, disclosure or reproduction is prohibited
# without the prior express written consent of Motorola, Inc.
#-----------------------------------------------------------------#-----------------------------------------------------------------# int strcmp(const unsigned char* source1,
#
const unsigned char* source2);
# Returns:
# value < 0 if source1 < source2
# value = 0 if source1 = source2
# value > 0 if source1 > source2
#-----------------------------------------------------------------.set
.set
.set
_eq,2
_cr0,0
_cr1,1
#aix# .toc
#aix#T..strcmp:
#aix# .tc
..strcmp[tc], strcmp[ds]
#aix# .align 2
#aix# .globl strcmp[ds]
#aix# .csect strcmp[ds]
#aix# .long .strcmp[pr],TOC[tc0],0
#aix# .globl .strcmp[pr]
#aix# .csect .strcmp[pr]
#aix#.strcmp:
.sect .text
.align 2
.extern strcmp
strcmp:
#nt#
.reldata
#nt#
.globl strcmp
#nt#strcmp:
#nt#
.long ..strcmp,.toc
#nt#
.text
#nt#
.globl ..strcmp
#nt#..strcmp:
#
#
#
#
#
#
#
#
#
#
#
r0 = temporary
r3 = source1 pointer, result, mask for first words
r4 = source2 pointer
r5 = 0x80808080
r6 = 0x01010101
r7 = source2 word
r8 = source1 word
r9 = temporary
r10 = source1 pointer
r11 = temporary
r12 = index
67
# See if the two pointers are both word aligned.
xor
r0,r3,r4
rlwinm. r0,r0,0,30,31
addis r6,r0,0x0101
mr
r10,r3
bne
Byte_By_Byte
# Generate an initial index so the word containing the first byte
# will be loaded. Compute a mask to set all bits in the bytes
# prior to the first in the words that are loaded.
rlwinm r11,r3,3,27,28
li
r3,-1
rlwinm r12,r10,0,30,31
subfic r11,r11,32
neg
r12,r12
slw
r3,r3,r11
#le#
#le#
# Complete the setup for the word aligned loop.
ori
r6,r6,0x0101
lwzx
r8,r12,r10
lwbrx r8,r12,r10
or
r8,r8,r3
# Mask off unused bytes.
slwi
r5,r6,7
subfc r0,r6,r8
andc
r9,r5,r8
lwzx
r7,r12,r4
lwbrx r7,r12,r4
and.
r11,r0,r9
addi
r4,r4,-4
addi
r12,r12,4
or
r7,r7,r3
# Mask off unused bytes.
bne
Source1_Has_Null
Word_Loop:
subfc.
bne
lwzx
#le#
lwbrx
subfc
andc
and.
addi
lwzx
#le#
lwbrx
beq
r3,r8,r7
Words_Differ
r8,r12,r10
r8,r12,r10
r0,r6,r8
r9,r5,r8
r11,r0,r9
r12,r12,4
r7,r12,r4
r7,r12,r4
Word_Loop
Source1_Has_Null:
# We terminated the loop because r8 has a null byte.
# Shift both words right so the null byte is the LSB.
# Can't do this with cntlzw because of a borrow if the byte
# preceeding the null has the value one.
rlwinm. r10,r8,0,0,7
li
r9,24
beq
shift
rlwinm. r10,r8,0,8,15
li
r9,16
beq
shift
rlwinm. r10,r8,0,16,23
li
r9,8
beq
shift
li
r9,0
shift:
srw
r7,r7,r9
68
srw
subfc
blr
r8,r8,r9
r3,r7,r8
Words_Differ:
# We terminated the loop because the words differ but
# r8 does not have a null byte. Return 1 or -1 based
# on the unsigned comparison.
subfe r3,r3,r3
nand r3,r3,r3
ori r3,r3,1
blr
Byte_By_Byte:
# Do strcmp a byte at a time.
lbz
r9,0(r3)
lbz
r0,0(r4)
subfc. r3,r0,r9
bnelr
Byte_Loop:
cmpi
_cr1,0,r9,0
beq
_cr1,Null_Byte
lbzu
r9,1(r10)
lbzu
r0,1(r4)
subfc. r3,r0,r9
beq
Byte_Loop
blr
Null_Byte:
mr
r3,r0
blr
References:
[1] "Communications of the ACM" vol. 27., no. 10 (Oct. 1984), pp. 1013 - 1030.
Suggested Code:
/*
*
File - “dry1.h”
Defines the functions defined in support.c and used by dhry21a.c
*/
#define
printf my_printf
#define fprintf my_fprintf
#define fopen my_fopen
#define fclose my_fclose
#define exit my_exit
extern int my_printf(const char *, ...);
extern int my_fprintf(const char *, ...);
extern FILE * my_fopen();
extern int my_fclose();
extern void my_exit();
69
/* file “support.c”
* This file provides substitutes for some of the library function calls
* used in Dhrystone which DINK doesn’t support.
*/
/* Set the number of dhrystone loops here.
*/
#define NUMBER_OF_RUNS 10000000
#include <stdarg.h>
/* These are the magic addresses for the DINK functions.
*/
unsigned long (*dink_printf)() = (unsigned long (*)()) 0x6638;
/* A version of malloc that will only supply up to 2048 bytes total.
*/
char *malloc(unsigned int size)
{
static char buffer[2048];
static char *next = buffer;
char *p = next;
next += ((size + 7) & ~7);
if (next >= buffer + sizeof(buffer))
/* Terminate by executing a zero.
*/
asm(".long 0");
return p;
}
/* Scanf is used only to read the number of times through the loop.
*/
/*ARGSUSED*/
void scanf(char *fmt, int *v)
{
*v = NUMBER_OF_RUNS;
}
/* This only will handle the printf calls in dhrystone.
The DINK printf
doesn't work for floating point, so convert the value here to integer.
*/
int my_printf(const char *fmt, ...)
{
int a1, a2, a3, sign;
char *neg_zero_fmt;
double round, fraction, val;
va_list ap;
va_start (ap, fmt);
if (strcmp(fmt, "%7.1lf \n") == 0) {
70
fmt = "%6d.%1d \n";
neg_zero_fmt = "
-%1d.%1d \n";
round = 0.05;
fraction = 10.0;
goto fake_float;
} else if (strcmp(fmt, "%10.1lf \n") == 0) {
fmt = "%9d.%1d \n";
neg_zero_fmt = "
-%1d.%1d \n";
round = 0.05;
fraction = 10.0;
goto fake_float;
} else if (strcmp(fmt, "VAX MIPS rating = %10.3lf \n") == 0) {
fmt = "VAX MIPS rating = %9d.%03d \n";
neg_zero_fmt = "VAX MIPS rating =
-%1d.%03d \n";
round = 0.0005;
fraction = 1000.0;
fake_float:
val = va_arg(ap, double);
if (val < 0) {
sign = -1;
val = -val;
} else {
sign = 1;
}
/* Round the value.
*/
val += round;
a1 = val;
a2 = val * fraction - a1 * fraction;
if (a1 == 0 && sign == -1)
fmt = neg_zero_fmt;
a1 *= sign;
} else {
a1 = va_arg(ap, int);
a2 = va_arg(ap, int);
a3 = va_arg(ap, int);
}
dink_printf(fmt, a1, a2, a3);
va_end (ap);
}
/* Dummy out the calls to exit, fopen, fprintf, and fclose.
*/
void my_exit() {}
int my_fopen() { return 1; }
71
int my_fprintf() {}
int my_fclose() {}
Conclusions:
Students should be able to note that:
•
“There are lies, damn lies, and benchmarks, in that order”.
•
The Dhrystone benchmark is small enough to understand, see opportunities for optimization, and
port easily to various computer environments.
•
The Dhrystone benchmark is string intensive and the resulting performance metric may be meaningless in applications involving other workloads, e.g. extensive mathematical calculations or bit
manipulation.
•
Not all compilers are created equal. The sample compilers shipped in the Excimer kit may generate
vastly different code, instruction scheduling, and results on this benchmark. However, the biggest
impact probably comes from the Motorola hand-coded strcmp function. Like most vendors, Motorola strives to provide the best benchmark results possible for marketing reasons..
Troubleshooting:
If the student is not able to:
•
Get a time function. The suggested code for Experiment 4 provides a double dtime(double
TICS_PER_SECOND) function which can be easily modified to provide timing information for this
benchmark.
•
Link with the DINK supplied printf or other functions. Check the addresses for the respective
functions in DINK using the symtab command.
•
Get the results to print from the dhry21a.c program. dink_printf will not accept floating point
formats. The results will have to be typecast as unsigned long or int to print. The loss of accuracy
is insignificant.
72
Experiment
11
Running the Linpack Benchmark (Debugging Stage)
Problem Statement:
•
In this experiment the student will adapt the popular Linpack benchmark to execute on the Excimer
board. (Contributed by Walter Guiot and Luis Narváez).
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
verify a popular industry metric of processor performance in embedded applications, Linpack, for
the PowerPC 603e microprocessor on the Excimer board.
•
compare processor performance, as measured by Linpack, to published values for other processors.
•
compare code generation and instruction scheduling, and resulting performance, for several competing compilers on the Linpack benchmark.
•
compare processor performance, as measured by Linpack, with performance given by the manufacturer.
73
Background Information:
LINPACK is a collection of subroutines used to benchmark the performance of computers in the
analysis and solving of linear equations and linear least-squares problems. LINPACK solves linear systems whose matrices are general, banded, symmetric indefinite, symmetric positive definite, triangular,
and tridiagonal square. The LINPACK routines are constructed such that locality of reference is maximized.
A C language version can be obtained: http://www.netlib.org/benchmark/ .
References:
[1] David A. Patterson, John L. Hennessy, Conputer Organization & Design The Hardware / Software Interface Morgan Kaufmann Publishers, Inc San Francisco, California 1994.
[2] http://www.netlib.org/linpack/
[3] The Linpack Benchmark: http://www.netlib.org/benchmark/top500/reports/report93/section2_16_2.html
Procedure:
1. Download the linpack benchmark from ftp sites such as http://www.netlib.org/linpack or
ftp://ftp.nosc.mil/pub/aburto.
2. Read the included documentation regarding compaling instruction.
3. Modify the source code to make it compatible with Dink instructions such as dink_printf. Remember that dink_printf does not support floating point, a function will have to be developed to handle
floating point, refer to experiment #5 for printf function for the Dhrystone benchmark.
4. Compile the source code and link it with the dtime() function develop in experiment #?. The file
dtime.s developed in the experiment may be used.
5. Get the PowerPC 603e performance information from the manufacturer and compare it with the
results obtained with linpack.
74
Suggested Code:
***This experiment is still in the ebugging phase as it is necessary to send floating point numbers to th
screen.***
!file "dtime.s" (For Metware High C/C++ Compiler/Assembler)
! Assembly language routine to convert 64-bit PowerPC TB facility to
! Double-precision, floating-point number. (Plus additional routines for
! testing.)
CJC 981216 (Contributed by Chuck Corley, Motorola)
! Register usage:
!
r3 = FPU
(upper 32 bits of floating point value)
!
r4 = FPL
(lower 32 bits of floating point value)
!
r5 = TBU(time base upper - read from spr or loaded for test)
!
r6 = TBL(time base lower - read from spr or loaded for test)
!
r7 = leading zeroes in a register or shift count of +/-(zeroes - 11)
!
r8 = accumulator for final EXPonent value of DPFP number
!
r9 = shift count of 32 - n where n = +/-(zeroes -11)
!
r10 = constant register of 11
!
r11 = link register storage
#define TBU 269;
!Special purpose register numbers for TB
#define TBL 268;
.data
Local_storage:
.double 0
.text
.global dtime
.global get_HID1
.global conversion_test
!For CodeWarrior:
!asm double conversion(double TICS)
conversion:
cntlzw r7,r5
!Find leading zeroes in TBU. Preserve in r7.
addi
r9,r0,32
!Will need a 32 in several places. Create one in r9.
addi
r10,r0,11
!Create a constant in r10 = 11.
subf. r8,r7,r9
!r8 will hold EXP. Currently (32 - leading zeroes)
beq+
tbu_is_zero
!TBU never got incremented? (Zeroes=32?) (Most likely)
subf.
add
bge+
r7,r10,r7
r8,r8,r9
tbu_lt_8yrs
!No. Is TB more than 2^^52? (Zeroes<11?) r7 = (Z-11)
!Final exponent will be (64 - 1 - leading zeroes).
!If TB>2^^52, shift TBU bits right. (Not likely)
tbu_gt_8yrs:
neg
subf
rlwnm
rlwnm
rlwnm
or
b
!for Z<11: fpu = tbu>>n=(11-Z);
!fpl = tbu<<n=(32-(11- Z))|tbl>>n=(11-Z);
r7,r7
!rlwnm shift count of (11-Z) = -(Z-11) = n = r7.
r9,r7,r9
!rlwnm shift count of 32-n = 32 - (11-Z) = r9.
r3,r5,r9,12,31 !Shift TBU right n = (11 - Z). Mask off [0:11].
r4,r5,r9,0,10 !Shift remaining TBU bits left n = 32-(11-Z)
r6,r6,r9,0,31 !Shift TBL right n = (11 - Z)
r4,r6,r4
!Or rest of TBU shifted left with TBL shifted right.
form_exponent !Go bias the exponent and or into FPU.
tbu_lt_8yrs:
subf
rlwnm
srw
or
rlwnm
xor
!for Z>=11: fpu=tbu<<n=(Z-11)|tbl>>n=(32-(Z-11));
!fpl=tbl<<n=(Z- 11);
r9,r7,r9
!Form a shift count of 32 - (11-Z) = r9.
r3,r5,r7,12,31 !Shift TBU left n = (Z-11). Mask off [0:11].
r5,r6,r9
!Shift TBL bits right n = 32-(Z-11).
r3,r3,r5
!Or TBU shifted left with TBL shifted right.
r6,r6,r7,0,31 !Shift remainder of TBL left n = (Z-11).
r4,r6,r5
!XORing with the same value shifted right is like ANDing
75
b
form_exponent !fpl with a mask of all zeroes in bits [32-(Z-11):31].
tbu_is_zero:
cntlzw r7,r6
subf. r8,r7,r9
beqtbl_is_zero
subf. r7,r10,r7
bgetbl_lt_63ms
!Z= 32
!Find leading zeroes in TBL.
!EXP = (32 - leading zeroes).
!Entire TBL count exactly zero? (Not likely)
!No. Is TB less than 2^^20? (zeroes < 11?)
!If not, will have to shift bits right. (Most likely)
tbl_gt_63ms:
!for z<11: fpu = tbl>>n=(11-z); fpl = tbl<<n=(32-(11-z));
neg
r7,r7
!rlwnm shift count of (11-Z) = -(Z-11) = n = r7.
subf
r9,r7,r9
!rlwnm shift count of 32-n = 32 - (11-Z) = r9.
rlwnm r3,r6,r9,12,31 !Shift TBL right n = (11 - z). Mask off [0:11].
rlwnm r4,r6,r9,0,10 !Shift remaining TBL bits left n = 32 - (11 - Z).
b
form_exponent
tbl_lt_63ms:
!for z>=11: fpu = tbl<<(z-11); fpl = 0;
rlwnm r3,r6,r7,12,31 !Shift TBL left n = (Z-11). Mask off bits 0-11.
xor
r4,r4,r4
!fpl = 0.
b
form_exponent
tbl_is_zero:
xor
r3,r3,r3
xor
r4,r4,r4
b compute_seconds
!for Z=32 && z=32: fpu = fpl = 0;
!Unlikely result that TB was zero. Prepare to
!return all zeroes for the floating point value.
form_exponent:
addi
r8,r8,1022
!Add DP bias (1023) -1 to the exponent
rlwinm r8,r8,20,1,12 !Biased DP EXP will be (63-(leading zeroes in TB)+1023).
or
r3,r3,r8
compute_seconds:
lis
r5, Local_storage@h
ori
r5, r5, Local_storage@l
stw
r3, 0(r5)
stw
r4, 4(r5)
lfd
f2, 0(r5)
!Load back in as 64bit float
fdiv
f1,f2,f1
!Divide by bus clock ticks per second
blr
!Return time in seconds as double in fp1
! Routine passed sample values of TBU and TBL. Returns FPU and FPL as
! unsigned long.
!For CodeWarrior:
!asm struct TB_View * conversion_test(unsigned long Upper,
!
unsigned long Lower, double TICS)
conversion_test:
or
r5,r3,r3
!Use test values of TBU and TBL passed in r3 and r4
or
r6,r4,r4
!as substitutes for values read from TB.
mflr
r11
!Save the return address.
bl
conversion !Convert TBU and TBL into FPU and FPL
mtlr
r11
!Return in r3 and r4
!For CodeWarrior:
! la
r3,Local_pointer(SP)!Return a pointer to the FPU storage location.
blr
! Routine passed sample values of TBU and TBL. Returns seconds as double.
!For CodeWarrior:
!asm double float_test(unsigned long Upper, unsigned long Lower, double TICS)
float_test:
or
or
mflr
bl
r5,r3,r3
r6,r4,r4
r11
!Use test values of TBU and TBL passed in r3 and r4
!as substitutes for values read from TB.
!Save the return address.
conversion !Convert TBU and TBL into FPU and FPL
76
mtlr
blr
r11
!Return as double in fpr1
! Routine reads the TBU and TBL. Returns seconds as double.
!For CodeWarrior:
!asm double dtime(double TICS)
dtime:
read_TB:
mfspr r5,TBU
!Get TBU.
mfspr r6,TBL
!Get TBL.
mfspr r7,TBU
!Get TBU again.
subf. r7,r5,r7
!Did it increment between reading TBU and TBL?
bgtread_TB
!If so, read them again. (Not likely)
mflr
r11
!Save the return address.
bl
conversion
!Convert TBU and TBL into FPU and FPL
mtlr
r11
blr
! Routine reads the HID1 (PLL_CFG) register. Returns in r3.
get_HID1:
mfspr r3,1009
!Get HID1 register.
blr
Conclusions:
Troubleshooting:
1. Make sure you are using the correct address for the dink_printf function.
2. Make necessary changes to handle floating point.
3. Use the timer funciton develop in experiment #?, do not use any of the ones provided with the
benchmark.
77
Experiment
12
Cache Impact on Benchmark Metrics (Debugging Stage)
Problem Statement:
•
In this experiment the student will compare the Linpack benchmark results with cache memory disabled to the results with cache memory enabled. (Contributed by Walter Guiot and Luis Narváez).
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
compare processor performance, as measured by Linpack, with and without cache enabled.
•
understand the advantages of cache memory in a computer system.
Background Information:
Cache memory is a special type of random access memory (RAM) that stores the most recently used
instructions and/or data from a larger main memory system. Cache memory can be accessed faster than
regular RAM.
Cache memory is categorized in levels. Level I (L1) cache memory is on the same chip as the microprocessor. Level II (L2) and later levels are usually separate memory chips. The microprocessor first
looks for the instructions in L1 cache, if it is not there (a miss) it goes to the next level, it continues
looking from level to level until reaching main memory or in the worst case a mass storage device, such
as disk drives or hard drives. These memory chips are typically static RAM (SRAM) modules that do
78
not need to be electromagnetically refreshed as DRAM does. These characteristics make cache memory
faster and more expensive than regular RAM.
There is a slight catch with cache memory, if there is a cache miss, then it takes around more clocks cycle to access data from DRAM, or ROM. For this reason a L2 cache that is too small could theoretically decrease performance.
The 603e provides independent 16-Kbyte, four-way set-associative instruction and data caches. The
cache line is 32 bytes in length. The caches use a least recently used (LRU) replacement policy.
The caches provide a 64-bit interface to the instruction fetch unit and load/store unit. The surrounding
logic selects, organizes, and forwards the requested information to the requesting unit. Write operations
to the cache can be performed on a byte basis, and a complete read-modify-write operation to the cache
can occur in each cycle. The load/store and instruction fetch units provide the caches with the address
of the data or instruction to be fetched.
Procedure:
1. Develop an assembly program to enable and disable cache memory. Refer to the PowerPC 603e
manual for the registers involve in enabling and disabling the cache.
2. Follow the procedure of experiment #? “Running the Linpack Benchmark”.
3. Link the assembly code developed here to the code of experiment #?.
4. Run the benchmark with cache enable and note the results.
5. Disable cache and run the benchmark again, compare both results and state your conclusions.
Questions:
79
References:
[1] The L2 Company: What is Cache? http://www.mindspring.com/~l2co/WhatIsCa.html
[2] What is…cache memory? http://www.whatis.com/cachemem.htm
[3] MPC603e & EC603e RISC Microprocessors User’s Manual
Suggested Code:
Troubleshooting:
1. Make sure you are using the correct registers for enabling and dissabling cache.
2. Refer to the troubleshooting to experiment #? “Running the Linpack benchmark”.
80
Experiment
13
Flash ROM (Debugging Stage)
Problem Statement:
•
This experiment requires the development of an assembly language program starting on Programmer
Space RAM location $70000 that will copy a program which resides on RAM location $71000 to a
free space Flash ROM location. Program copied into Flash ROM will auto-execute from its present
location. (Contributed by José I. Quiñones and Eisen Montalvo-Ruiz).
Objectives:
Upon completion of this laboratory experience, students will be able to:
•
Write and assemble an assembly language subroutine.
•
Execute a piece of code that will in turn copy another piece of code to Flash ROM and execute it.
•
Write assembly code directly into Flash ROM space by means of assembly code.
Background Information:
Any microprocessor-based system needs memory devices to hold data and program instructions.
Memories can be classified as volatile and non-volatile.
The basic difference between both realms of memories is that volatile memories looses its data contents
when power is removed from the semiconductor chip while non-volatile memories holds its data contents even when power is removed. This has some very interesting implications, which must remain
clear to microprocessor based system designers since both types of memory have their advantages and
disadvantages as well as a typical use.
81
Non volatile memory is used when a system is to execute a dedicated application. Take for example
your computer. When you turn it on, it executes a self-initialization procedure we call “booting”. How
does the CPU know what to do? The BIOS (Binary Input Output System) is the dedicated application
for initialization and is stored on a non-volatile type of memory. If this program is by any means
erased, the computer will just never be able to restart!
Once the computer starts, the application to be executed can be anything we decide. It would be quite
expensive, and space prohibiting, to have all the applications we would like to have on a computer
stored on non-volatile silicon memory chips. Instead we have found quite useful to store our applications on magnetic or other type of media and then write them to a bank of volatile type of silicon memory, which is fast and can cope with the microprocessor need for instructions to execute.
This volatile memory can be written over and over repeatedly. And when power is no longer applied to
the memory array, all information is forever lost. Typical use of volatile memory, (such as RAM) is to
load Operative Systems (OS) and any application you may want to execute on the computer. Only the
necessary instructions reside on RAM. The OS is responsible of loading the RAM with the necessary
instructions as they are needed.
A disadvantage of non-volatile memory cells is that they have a short life and usually can not be rewritten more than a specified number of times. They are also significantly slower than volatile memories.
We have come to accept that there is a need and a use for both types of memory. This is why our Excimer board is equipped with 1 MB of RAM, which is our volatile type of memory and 4MB of Flash
ROM, which is our non-volatile memory.
At this time you must be familiar with the fact that the Excimer board has an “operative system” you
can use, called the DINK32. This Monitor program is stored on the Flash ROM and is responsible of
82
initializing all peripheral activity within the Power PC 603e evaluation board. When you press reset,
this dedicated application executes and it takes over the board. This monitor also enables the user to see
and use all registers and memory space with commands such as Memory Display (MD), Register Display (RD) and so on. The Monitor even has an online assembler and disassembler that enables the user
to see code and to enter code manually.
The Excimer also needs RAM to operate. That is, any variable and/or data must be written to RAM
since this value may change continuously. So the 1 MB of RAM provided is needed by the Excimer to
operate. Fortunately for developers, this RAM can also be loaded with applications trying to exploit
the power embedded on the 603e CPU. For testing and evaluation purposes, the RAM will hold instructions (which must be downloaded periodically with the help of a PC) that can be traced and
watched using the DINK32 tools.
It is the ultimate goal of any Engineer using an evaluation board such as the Excimer, to create a free
running application capable of self-supporting itself. In other words that an specified application such
a control or embedded system may run without the need of a PC computer. This implies that the Engineer code will always be present on the Excimer Board but it was only downloaded once.
You are currently downloading the code every time you need to execute it. Eventually you will reach a
time when your code will be totally debugged. It would be really appropriate to make the Evaluation
Board a stand-alone unit with your code as the dedicated application.
It is the goal of this laboratory experiment to show how to write the Flash ROM area so that a dedicated application may be coded in this non-volatile memory. The architecture of the suggested procedure is to write a simple program that writes another code into the Flash ROM. As a more advanced
option, the “writer program” could set the RESET vector to the address where it will begin writing the
“written code”.
83
As a safety feature, so that DINK32 is still available after the flash ROM is updated, the “written
code” is to ask the user if the application to be executed is DINK32 or itself. The true DINK32 pointer
can be saved from the previous vector table. The program will now have the ability to either jump to
the short code (which can be the LED blinker code) or to the DINK32.
How to p rogram the Flash ROM:
The Excimer board has 4Mbytes of Flash ROM where the DINK32 monitor resides. Nevertheless,
user defined applications can be recorded on this memory space as long as some precautions are taken.
Do recall that if the monitor is no longer working, successive Flash recording might not be as easy (or
possible with existing hardware).
The Flash ROM chips being interfaced by the Power PC 603E on our Excimer board are AMD’s
AM29LV800B 8 Mbits memory modules. Detailed information on how to erase and program the Flash
cells can be found in AM29LV800B Data Sheet. Some introductory information follows, but it is advised to students to read the proposed Application Note.
The AMD FLASH ROM chip already contains a control unit inside of each FLASH chip. All that is
needed to erase, read or program a byte, sector or the entire chip is a set of commands which will put
the chip into a predefined state. Once the state is defined and the corresponding commands sent to the
chip via the Data Bus, the internal control logic will do the rest. States which can be entered are: Sector
Protect, Sector Unprotect, Autoselect, Erase Sector, Erase Chip, Erase Suspend, Erase Resume, Program, Reset and Unlock Bypass
Device programming occurs by executing the Program Command sequence. This initiates the Embedded Program algorithm—an internal algorithm that automatically times the program pulse widths and
verifies proper cell margin. The Unlock Bypass mode facilitates faster programming times by requiring
only two write cycles to program data instead of four. Instead of using the common Program Command
84
(6 step command) you can now use the Unlock Bypass Command Sequence (refer to Table 5 of the
AM29LV800B Data Sheet).
Device erasure occurs by executing the Erase Command sequence. This initiates the Embedded Erase
algorithm—an internal algorithm that automatically preprograms the array (if it is not already programmed) before executing the erase operation. During erase, the device automatically times the erase
pulse widths and verifies proper cell margin.
The host system can detect whether a program or erase operation is complete by observing the
RY/BY# pin, or by reading the DQ7 (Data# Polling) and DQ6 (toggle) status bits. After a program or
erase cycle has been completed, the device is ready to read array data or accept another command. The
sector erase architecture allows memory sectors to be erased and reprogrammed without affecting
the data contents of other sectors. The device is fully erased when shipped from the factory.
The AM29LV800B Data Sheet explains all other mentioned states. Also how to enter and exit the mentioned modes of operation can be clearly seen on Tables 4 and 5 of said document. But be careful. although AMD is very specific in telling all addresses where specific data has to be written, the implementation of the Excimer board did changed these parameters.
The way the four Flash ROM chips were assembled on the Excimer Board (configured as a two memory bank of double 16 bit words) redefines all addresses and expected data to and from the memory
chips. First of all is the definition of the Power PC 603E Microprocessor data bus as big-endian while
the Flash ROM data bus is a little-endian device. This means that what is the MSB to the Microprocessor is actually the LSB to the Flash ROM! Data has to be bit reversed (and that is as simple as mirroring the 16 bit words) so that the Flash ROM understands. In other words whatever data the
AM29LV800B Data Sheet tells you to send to the Flash, has to be bit reversed before it is actually
sent.
85
NOTE: This is actually the programmer’s job. Programmer could develop a subroutine to mirror the
word or else he/she could do it by hand before actually writing the assembly or C code.
The other very important fact to have in mind is that the memory mapping suggests a 3 bit left shifting.
That is (if observing the Excimer implementation schematic) the 3 less significant bits are not used to
select memory space inside the memory device itself (recall that PowerPC address bus is big-endian
while AMD flash devices are little-endian. That is why address lines A31, A30 and A29 are the ones
not connected to the memory device. They are actually the less significant address lines to the Power
PC 603e). These three address lines are in fact used by the Excimer memory control FPGA to select
one of the four memory chips. That is why on the Excimer implementation Schematic there is a different line for each one of the WE* (write enables) control signals while CS* (Chip Select) and OE* (Output Enable) are shared.
NOTE: Since the three less significant address lines are used only for chip selection while programming, every address present on the AM29LV800B Data Sheet (including command sequence as well as
sector segregation) has to be shifted left by 3.
The following are the required rules for the Excimer V1 and V2 boards since the address line is shifted
by three and the data lines are bit reversed.
•
Rule for converting from expected address (found on AM29LV800B Data Sheet) to shifted address
(Excimer Memory Mapped) left shift address by 3, bit reverse data example:
address 0x555 < 3 = 0x2aa8 (0b0101 0101 0101 < 3 = 0b0010 1010 1010 1000)
•
Rule for converting from bit not reversed (little Endian to Little Endian) to bit reversed (Little Endian to Big Endian). Bit reverse the data line example:
data 0xaa = 0x55 (0b1010 1010 = (bit reversed) 0b0101 0101)
86
On the following sequences, address and data has already been shifted as well as bit reversed. You
should not have trouble if using these examples. Do recall that you will need to make reference to the
AM29LV800B Data Sheet when trying to search for sectors and specific addresses.
Entering Autoselect mode
Write address: 0x2aa8 with data: 0x55555555
Write address: 0x1550 with data: 0xaaaaaaaa
Write address: 0x2aa8 with data: 0x09090909
Get Manufacturer ID:
Read address: 0x0000 get data: 0x80008000
Get Device ID:
Read address: 0x0008 get data: 0x5B445B44
Get Sector Protect status for each sector:
Read address: 0x[SA] get data: 0x00000000 for non protected
Read address: 0x[SA] get data: 0x80008000 for protected
Reset sequence exit autoselect mode:
Write address: 0x0000 with data: 0x0f0f0f0f
Erasing a flash sector sequence:
Write address: 0x2AA8 with data: 0x55555555
Write address: 0x1550 with data: 0xAAAAAAAA
Write address: 0x2AA8 with data: 0x01010101
Write address: 0x2AA8 with data: 0x55555555
Write address: 0x1550 with data: 0xAAAAAAAA
Write address: sector address with data: 0x0C0C0C0C
Programming flash memory:
Write address: 0x2AA8 with data: 0x55555555
Write address: 0x1550 with data: 0xAAAAAAAA
Write address: 0x2AA8 with data: 0x05050505
Write address: Word address with data to be programmed
Now some notes: The Flash ROM device can change a one to a zero in any cell bit, but not visa versa.
Thus, it is a good practice to erase the memory first before writing to it. Otherwise after writing, the
memory may be corrupted, since a zero can’t be changed back to a one. In some cases you can play
with the byte to be recorded. If for example a cell has the byte 0x55 programmed on it and you want to
87
write 0x11, it can be done without erasing the cell. This is a good idea when there are only a few bytes
to be programmed, but it would not be wise when programming large amounts of data.
Erasing can only be done one sector at a time or the entire chip, so one can not erase only a portion of a
sector. The smallest amount of flash to erase is one complete sector. (Refer to Table 3 on the
AM29LV800B Data Sheet for sector addresses and remember to shift left by 3 any address). Be careful
when erasing a sector. There might be important code (as OS code) on the sector.
References:
•
AM29LV800B Data Sheet (www.amd.com)
•
FL.C Source Code
Suggested Code:
Two sets of code are available on this section. The first thing any student should try is to properly
send commands to the Flash ROM chips. Since writing or erasing may be hazardous to the Excimer
health, it is recommended that a few experiments are performed before any erasing or programming is
attempted. This will allow the students to practice Assembly Language Programming to interface the
ships not by using the data supplied by the AM29LV800B Data Sheet but with the already shifted
addresses and inverted data.
The first code snippet shows how to enter Autoselect Mode. Students will be able to check memory
spaces for the specified data. If any method to memory display byte information shows the specified
data (note inversion will be noted) the command sequence has been successful. Otherwise, one or more
steps might not be correct (Check troubleshooting section for more information on possible errors).
.text
.global readflash
readflash:
xor r12, r12, r12
xor r13, r13, r13
!Clearing Registers 12 and 13
88
lis r12, 65472
addi r12, r12, 8
!R12 contains $FFC00000
!Register12 contains address $FFC00008
!This address is used to request the Autoselect mode data as
!specified by the Flash ROM data sheet + a shift by 3.
xor r14, r14, r14
lis r14, 65472
addi r14, r14, 10920
!Clearing Register 14
!Register 14 contains $FFC00000
!Register 14 contains $FFC02AA8
!This address is used to send the first step into the
!Autoselect sequence as specified by the Flash ROM data
!sheet + a shift by 3.
xor r15, r15, r15
lis r15, 65472
addi r15, r15, 5456
!Clearing Register 15
!Register 15 contains $FFC00000
!Register 15 contains address $FFC01550
!This address is used to send the second step into the
!Autoselect sequence as specified by the Flash ROM data !sheet
+ a shift by 3.
xor r16, r16, r16
addi r16, r16, 170
slwi r16, r16, 8
addi r16, r16, 170
slwi r16, r16, 8
addi r16, r16, 170
slwi r16, r16, 8
addi r16, r16, 170
!Clearing Register 16
!Through this steps, the data $AAAAAAAA which will be
!sent to the Flash on the first step of the Autoselect
!sequence, is assembled. Note that bit reversal has been
!accounted for.
xor r17, r17, r17
lis r17, 21845
addi r17, r17, 21845
!Clearing Register 17
!Data $55555555 is assembled through this steps. This is
!the data that will be sent on the second step of the
!Autoselect sequence. Bit reversal has been taken care of.
xor r18, r18, r18
addi r18, r18, 9
slwi r18, r18, 8
addi r18, r18, 9
slwi r18, r18, 8
addi r18, r18, 9
slwi r18, r18, 8
addi r18, r18, 9
!Clearing Register 18
!Data $09090909 is assembled through this steps. This
!word is sent to the Flash to differentiate the command
!sequence from all others. This data is specific to the
!Autoselect mode.
stwx r17, r14, r13
addi r14, r14, 4
stwx r17, r14, r13
!Store $55555555 data in $FFC02AA8
!First sequence step taken care of.
!It has to be written to both memory banks.
stwx r16, r15, r13
addi r15, r15, 4
stwx r16, r15, r13
!Store $AAAAAAAA data in $FFC01550
!Second sequence step taken care of.
!It has to be written to both memory banks.
subi r14, r14, 4
!Subtract 4 from R14 so that the address FFC02AA8 is
!once again available.
!Store $09090909 data in $FFC02AA8
!Third step into the Autoselect sequence taken care of.
!It has to be writen to both memory banks.
stwx r18, r14, r13
addi r14, r14, 4
stwx r18, r14, r13
!Register 16 contains data $AAAAAAAA
!Register 18 contains data $09090909
! At this moment, the Flash ROM is on the Autoselect Mode. All memory reads will return
!Autoselect Mode Information such as Manufacturer ID, Device ID and Sector Protect
!Verification status. To exit Autoselect Mode, the Autoselect Reset sequence or a Hardware
!reset must be performed.
lwzx
r19, r12, r13
!Loading into Register 19 Manufacturer ID for the first
!memory bank.
addi r12, r12, 4
89
lwzx
r20, r12, r13
!Loading into Register 20 Manufacturier ID for the second
!Memory Bank
The second snippet of code is actually the subroutine needed to write a byte. Extreme care must be
taken when writing data to the Flash as Excimer can easily become corrupt and inoperating. Students
are encouraged to practice on a free sector (preferably sector 16), which will certainly be erased, or else
can be erased without fear of damaging the OS. Check the AM29LV800B Data Sheet for further Excimer sector division information. Refer to the troubleshooting section for problems regarding difficulty
to write data to the Flash.
.text
.global writeflash
writeflash:
xor r13, r13, r13
!Clearing Register 13
xor r12, r12, r12
lis r12, 65472
addi r12, r12, 10920
!Clearing Register 12
!Register 12 contains Address $FFC00000
!Register 12 contains Address $FFC02AA8
!xor r14, r14, r14
!lis r14, 65472
!addi r14, r14, 10924
!Clearing Register 14
!Register 14 contains Address $FFC00000
!Register 14 contains Address $FFC02AAC
xor r15, r15, r15
lis r15, 65472
addi r15, r15, 5456
!Clearing Register 15
!Register 15 contains Address $FFC00000
!Register 15 contains Address $FFC01550
xor r16, r16, r16
addi r16, r16, 170
slwi r16, r16, 8
addi r16, r16, 170
slwi r16, r16, 8
addi r16, r16, 170
slwi r16, r16, 8
addi r16, r16, 170
!Clearing Register 16
!Through this steps, the data $AAAAAAAA which will be
!sent to the Flash on the first step of the Autoselect
!sequence, is assembled. Note that bit reversal has been
!accounted for.
xor r17, r17, r17
lis r17, 21845
addi r17, r17, 21845
!Clearing Register 17
!Data $55555555 is assembled through this steps. This is
!the data that will be sent on the second step of the
!Autoselect sequence. Bit reversal has been taken care of.
xor r18, r18, r18
addi r18, r18, 5
slwi r18, r18, 8
addi r18, r18, 5
slwi r18, r18, 8
addi r18, r18, 5
slwi r18, r18, 8
addi r18, r18, 5
!Clearing Register 18
!Data $05050505 is assembled through this steps. This
!word is sent to the Flash to differentiate the command
!sequence from all others. This data is specific to the
!Autoselect mode.
stwx r17, r12, r13
addi r12, r12, 4
stwx r17, r12, r13
!Store $55555555 data in Address $FFC02AA8
!First sequence step taken care of.
!It has to be written to both memory banks.
!Register 16 contains data $AAAAAAAA
!Register 18 contains data $05050505
90
stwx r16, r15, r13
addi r15, r15, 4
stwx r16, r15, r13
!Store $AAAAAAAA data in $FFC01550
!Second sequence step taken care of.
!It has to be written to both memory banks.
subi r12, r12, 4
stwx r18, r12, r13
addi r12, r12, 4
stwx r18, r12, r13
!Subtract 4 from R12 so that the address FFC02AA8 is
!once again available.
!Store $05050505 data in Address $FFC02AA8
!Third sequence step taken care of.
! It has to be written to both memory banks.
xor
lis
xor
lis
!Clearing
!Register
!Clearing
!Register
r19, r19, r19
r19, 65484
r20,r20,r20
r20, 85
stwx r20, r19, r13
Register 19
19 contains Address $FFCC0000
Register 20
20 contains data $00000055
!Program $00000055 data in FLASH Address $FFCC0000
!At this moment, data will have been programmed into $FFCC0000
/*---------------------------------------------------------------------------------------*
main.c
EMR 29/4/99
This code writes a program to FlashROM and sets the booting vector to the program written.
There's some problem with the FlashROM. Right now the program is stuck with trying to write correctly to the FlashROM. I have tried the same program with two different Excimer Boards and the
results are different. Although the second time it write some information correctly. I think the
FlashROM is damaged.
----------------------------------------------------------------------------------------*/
#include <stdio.h>
#define
#define
#define
getchar dink_get_char
putchar dink_write_char
printf dink_printf
void blink_leds(int addr, int i);
unsigned long (*dink_get_char)() = (unsigned long (*)()) 0x1e4c4;
unsigned long (*dink_write_char)(char) = (unsigned long (*)(char)) 0x5eb4;
unsigned long (*dink_printf)() = (unsigned long (*)()) 0x6270;
#define flash
0xffc00000
#define addr1
#define addr2
0x2aa8
0x1550
#define
#define
#define
#define
#define
#define
#define
#define
0x01010101
0xaaaaaaaa
0x55555555
0x0c0c0c0c
0x05050505
0x04040404
0x09090909
0x00000000
void
void
void
void
void
void
void
void
zerosones
allas
allfives
zeroscs
zerosfives
zerosfours
zerosnines
allzeros
erase_sector(int);
program(unsigned int *, unsigned int);
unlock_bypass();
write_word(unsigned int *, unsigned int);
unlock_bypass_reset();
leds_main();
blink_leds(int addr, int i);
blank();
91
void main()
{
unsigned int *addr = (unsigned int *)0xffd00000;
unsigned int *prog = (unsigned int *)0x0;
unsigned int size = 0;
//Borrar sector donde estara el codigo
erase_sector(7);
//Escribir nuestro codigo al FlashROM
/*unlock_bypass();
size = (unsigned int)blank - (unsigned int)leds_main;
for(unsigned int i=0; i<size/4; i++)
{
prog = (unsigned int *)((unsigned int)leds_main + i*4);
printf("%x ", (unsigned int)prog);
//program(addr+i, *prog);
write_word(addr+i, *prog);
}
unlock_bypass_reset();*/
prog = (unsigned int *)((unsigned int)leds_main);
write_word(addr, *prog);
//Traer 1er sector de MDink al RAM
//Cambiar Boot Pointer al vector table
//Borrar 1er sector del MDink
//Reescribir el sector al FlashROM
//Rebootear
}
void erase_sector(int sector)
{
unsigned int *sect, *command;
sect=0;
if(sector<4)
{
//Sector 3 or less
switch(sector)
{
case 0:
sect =
break;
case 1:
sect =
break;
case 2:
sect =
break;
case 3:
sect =
}
}
else if(sector>18)
{
return;
}
else
(unsigned int *)(0x0 + flash);
(unsigned int *)((0x2000<<3) + flash);
(unsigned int *)((0x3000<<3) + flash);
(unsigned int *)((0x4000<<3) + flash);
92
{
sect = (unsigned int *)((0x8000<<3)*(sector-3)+flash);
}
command = (unsigned int *)(flash + addr1);
*command = allfives;
command +=1;
*command = allfives;
command = (unsigned int *)(flash + addr2);
*command = allas;
command +=1;
*command = allas;
command = (unsigned int *)(flash + addr1);
*command = zerosones;
command +=1;
*command = zerosones;
command = (unsigned int *)(flash + addr1);
*command = allfives;
command +=1;
*command = allfives;
command = (unsigned int *)(flash + addr2);
*command = allas;
command +=1;
*command = allas;
printf("%x ", (unsigned int)sect);
*sect
= zeroscs;
}
void program(unsigned int *addr, unsigned int word)
{
unsigned int *command;
command = (unsigned int *)(flash + addr1);
*command = allfives;
command +=1;
*command = allfives;
command = (unsigned int *)(flash + addr2);
*command = allas;
command +=1;
*command = allas;
command = (unsigned int *)(flash + addr1);
*command = zerosfives;
command +=1;
*command = zerosfives;
printf("%x ", (unsigned int)addr);
printf("%x\n", word);
*addr = word;
}
void unlock_bypass()
{
unsigned int *command;
command = (unsigned int *)(flash + addr1);
*command = allfives;
93
command +=1;
*command = allfives;
command = (unsigned int *)(flash + addr2);
*command = allas;
command +=1;
*command = allas;
command = (unsigned int *)(flash + addr1);
*command = zerosfours;
command +=1;
*command = zerosfours;
}
void write_word(unsigned int *addr, unsigned int word)
{
unsigned int *command;
command = (unsigned int *)(flash);
*command = zerosfives;
command +=1;
*command = zerosfives;
printf("%x ", (unsigned int)addr);
printf("%x\n", word);
*addr = word;
}
void unlock_bypass_reset()
{
unsigned int *command;
command = (unsigned int *)(flash);
*command = zerosnines;
command +=1;
*command = zerosnines;
command = (unsigned int *)(flash);
*command = allzeros;
command +=1;
*command = allzeros;
}
void leds_main()
{
int decimal_no;
char LED;
char number;
do
{
printf ("\nSelect the LED you want to blink:\n");
printf ("\tS - Press S for the Status LED\n");
printf ("\tE - Press E for the Error LED\n");
printf ("\tQ - Press Q to Quit\n");
LED = getchar();
if (LED == 'E' || LED == 'e')
{
printf ("\nEnter the number of times (1-9) to blink the Error LED: ");
do{
number = getchar();
}while ( !((number >= '0') && (number <= '9')) );
putchar(number);
decimal_no = number - 48;
94
blink_leds(0x40600000, decimal_no);
}
else if (LED == 'S' || LED == 's')
{
printf ("\nEnter the number of times (1-9) to blink the Status LED: ");
do{
number = getchar();
}while ( !((number >= '0') && (number <= '9')) );
putchar(number);
decimal_no = number -48;
blink_leds(0x40200000, decimal_no);
}
} while ( LED != 'Q' && LED != 'q' );
/* X or x */
return;
}
void blink_leds(int addr, int i)
{
unsigned long count;
int loop;
for (loop = 0 ; loop < i; loop++)
{
*(char *) (addr) = 0x00;
//turn on error
for(count = 0; count <= 0xfff00; count ++);
*(char *) (addr) = 0x08;
//turn off error
for(count = 0; count <= 0xfff00; count ++);
}
*(char *) (0x40600000) = 0x08;
}
void blank(){}
Troubleshooting:
Trouble to enter a mode using the specified word sequence tends to occur either because data was not
properly reversed or because address was not properly shifted. Recall that the memory mapping of the
Excimer does not have to be that of the AMD memory devices as specified on the AM29LV800B Data
Sheet. In fact no memory device has to actually be memory mapped as specified by a datasheet. Those
addresses provided by the manufacturer are offsets and depend greatly on were they were placed.
On the Excimer Board, Flash ROM was placed on address $FFC00000. Every offset has to be added to
that base address. But this is not all. Since Power PC data bus is 64 bits wide and chips are just 16,
they must be cascaded to supply the need. If you are using the AM29LV800B Data Sheet as reference,
take in mind that every address presented has to be shifted by three. If you are using the notes presented on this lab section, the addresses are already shifted.
95
As explained earlier, Power PC 603e is a big-endian device while AMD Flash chips are little endian.
Address bus was already fixed so that little address bit were reversed by hardware. Data bus does not
has to be hardware reversed since it is irrelevant if you programmed a cell backwards. When it is read, it
will get backwards again and thus rectified. This leaves programmers with the problem that whatever
the Flash chip is expecting as a command will have to be reversed by hand. Bit reversing is sometimes
confused as taking a “1” and changing it to a “0” and viceversa. This will actually not work when writing commands to the Flash device.
What we mean by bit reversing is actually a mirror of the word. That what was the MSB now becomes
the LSB and viceversa. If you are having trouble entering into a specific mode check that you have done
this right. Another common mistake is to reverse the nibbles in the bytes or the bytes on the word. You
actually have to reverse the 16 bit word. An “0A” reversed is not “A0” but “50”. (00001010 mirrored
is not 10100000 but it is 01010000).
96