ETC TL32

Real Time System Testing
MIT 16.070 Lecture 32
hperry
5/4/01
Real Time System Testing (32)
• The next three lectures will focus on:
(R 11.3)
– Lecture 30:
• How to minimize failure in real time systems
• Methods used to test real time systems
(R 13)
– Lecture 31:
• What is Software Integration?
• Test Tools
• An example approach for integration and test of the MIT
16.070 final project
(R 11.4)
– Lecture 32:
• Fault Tolerance
• Exception Handling
• Formal Test Documentation
hperry
5/4/01
Fault Tolerance
What does it mean for a system to be fault tolerant?
The system can operate (although performance may be degraded) in
the presence of a software or a hardware failure.
How do you design a fault tolerant system?
• Incorporate exception handling to tolerate missed deadlines or work
around error conditions
• Design fault tolerant or redundant hardware or software into the
system
Ask yourself questions…how can the system fail?
hperry
5/4/01
Some Exception Handling Methods
Q: What if your system randomly misses data from a sensor?
A: Time tag your system data and use it only if the time tag has been
updated since the last time it was used.
Navigation Sensor Tasks
Sensor data
pools
Sensors
GPS
Shared
Memory
Lat, lon, alt,
time tag
Lat, lon, alt,
time tag
GPS data with time tag is written to shared memory
When other tasks come to read that data out of shared memory,
they can discard the data if the new time tag = old time tag.
hperry
5/4/01
Data consuming
tasks
Some Exception Handling Methods
Q: What if your system randomly misses data from a sensor? (continued)
A: If data is too stale and the system cannot function properly without the
new data, switch to a degraded mode of operation
– In a navigation system, this might be a backup navigation mode
that operates on minimal inputs, separate from those that have
failed
– In a space explorer robot system, this might be a zero-velocity state
where the robot waits for communication of a set of new
commands
hperry
5/4/01
Some Exception Handling Methods
Q: What if data goes bad for any number of reasons? What if bad data
results in...
• Divide by zero in the software algorithm
• Data input from sensors out of a specified range (overflow
condition for the algorithm or for the data type)
A: Add conditionals to your software to work around the problem.
hperry
5/4/01
Instead of
result = y/x;
Use
if x !=0 then result = y/x;
data = get_data();
data = get_data();
if data>1000, data = 1000;
if data<0, data = 0;
Some Exception Handling Methods
Q: What if a critical task should hang during execution?
– For example, a task is waiting on data from a sensor, but the sensor
loses its data link to the processor before it can provide the data)?
task
while (1)
loop
hperry
5/4/01
data = get_data();
Retrieves data from sensor
and returns
Some Exception Handling Methods
A: Relieve the task from waiting by…
• Designing functions with return values to indicate good/bad status
• Adding timeouts on retrieving the data to those functions to drive
good/bad status return.
Note:
• This brings up the need for call by reference in C
• This concept can be expanded to operating system calls (posting
mailboxes, releasing semaphores), library function calls, etc. provided
the RTOS supports it.
hperry
5/4/01
Some Exception Handling Methods
The importance of “call by reference” to facilitate error handling
• C functions can return only one value
• If that value = status, how does the calling task (or function) get any
information back from the task? The answer - Call by reference.
hperry
5/4/01
Instead of
int get_data();
Use
int get_data(*int x)
where the int returned is data:
data = get_data();
where int returned is the status and y is data:
status = get_data(&y)
Some Exception Handling Methods
Without exception handling:
int y;
int get_data();
void light_LED();
while (1)
{
y = get_data();
if (y>100)
{
light_LED();
}
}
hperry
5/4/01
/* get data from sensor software*/
/* turn on LEDs */
/* end infinite loop */
Some Exception Handling Methods
With exception handling:
# include <stdio.h>
int y, error;
int get_sensor_data(int* x)
void light_LED();
while (1)
{
error = get_sensor_data(&y);
/* pass the function the address of y, return error */
if (!error)
{
if (y >100) light_LED();
/* turn on LEDs */
}
else printf("Error in data coming from sensor = %d", error);
}
hperry
5/4/01
/* end infinite loop */
Exception Handling Methods - The Tradeoff
• On the one hand, exception handling can guard against problems such
as:
– Erroneous mathematical conditions (divide by zero, overflow)
– Tasks that hang waiting for inputs that will never come (due to
failed hardware, poor communication link, software bug etc.)
– Poor reactions to missed deadlines
• On the other hand, putting in all of this exception handling takes up
resources (CPU time and memory) that must be worth the trade-off
• You must balance the two to achieve a robust software design that
works within the timing and sizing constraints of the system
hperry
5/4/01
Fault Tolerance - Checking Hardware Resources
How can a processor check its own status?
• Built-In-Test (BIT)
– Ongoing diagnostics of the hardware that runs the software
– Interface checks
• CPU testing (done in the background)
• Memory testing
– Checking for memory corruption due to vibration, power surges,
electrostatic discharge, single event upsets, etc.
– Use error detection & recovery schemes (CRC, Hamming Code)
• Watchdog Timers
– Counting registers used to ensure that devices are still on line
– CPU resets the timer at regular intervals. Timer overflow indicates
a problem with the CPU
hperry
5/4/01
Fault Tolerance
Redundant Hardware Solutions - A two processor scheme
Primary
•
•
•
•
•
•
Secondary
Primary sends replica of all its inputs to Secondary
Secondary runs same software as Primary
Secondary checks for “pulse” from Primary to verify its health
If pulse is absent, Secondary takes over the system
Requires redundant communication lines to all system components
Many military aircraft systems are built this way
hperry
5/4/01
Fault Tolerance
Redundant Hardware Solutions - A two processor scheme
Primary
• When might this scheme fail?
–
–
–
hperry
5/4/01
Secondary
Fault Tolerance - Redundant Processors
• Computers can vote on who is worthy of staying in the system
- Check “pulse” to be sure the computers are on-line
- Compare data outputs from computations
How many do you need?
A
B
2?
hperry
5/4/01
A
C
B
3?
Fault Tolerance - Redundant Processors
A
B
2?
A
C
hperry
5/4/01
B
3?
A says B is sick
B says A is sick
Who is right?
Who should take over the system?
A says B is sick
B says A is sick
C says A is okay
C says B is sick
You might deduce that B is sick
But what if you lose one computer?
You must consider the probability of losing a
computer given the catastrophe of being in a 2
computer case.
Fault Tolerance - Redundant Processors
• In some cases 4 computers are necessary, each checking the status of
the other 3.
A
B
C
D
E
hperry
5/4/01
Is there any way that
the 4 computer scheme
can still fail?
Fault Tolerance - Redundant Processors
Who needs 4 computers and a backup? The Space Shuttle
• On ascent and landing/entry the Space Shuttle uses 4 identical
computers and one backup. Why? A 1/2 second glitch in the
guidance, navigation & control software will cause the shuttle to spin
out of control.
• During rendez-vous and docking, 2 computers are used
• On orbit, only one computer is necessary
• In all cases, the backup computer is always available and runs a
different set of software than the other 4, a technique known as Nversion programming.
hperry
5/4/01
N-Version Programming
• Same system requirements for multiple implementations
• The different implementations of code are written by independent
teams or contractors
• Eliminates the common software fault issue
• Often used as a backup system
hperry
5/4/01
TEST DOCUMENTATION
hperry
5/4/01
Real Time System Testing (32)
Test Documentation (Test Plan / Test Report)
• Includes an Executive Summary
• Describes test environment
• Identifies software to be tested
• Identifies tests that will be run on the software
• Includes a requirements traceability matrix
• Describes results of each test
• May have additional information provided as “notes”
hperry
5/4/01
Section 1 - Executive Summary
• System Overview
– Purpose of the system
– General operation of the system
– History of system development
• Document Overview
– Summarize contents of the test plan / report
– Summarize key test runs and success/failure of tests
– Includes an overall assessment of the project software
hperry
5/4/01
Section 2 - Software Test Environment
• Software Items Under Test
– Identify what exactly is being tested (i.e. the workstation and
handyboard software)
• Components in the Software Test Environment
– Identifies operating systems, compilers, communications software,
related application software, etc. used to test the workstation &
handyboard software
– Identify version numbers for the various software test environment
• Hardware and Firmware
– Identifies computer hardware, interfacing equipment, extra
peripherals, etc. used in the testing of the software
– Describe purpose of each item
• Software Test Configuration (Diagram)
hperry
5/4/01
Section 3 - Test Identification
Identify planned tests for the….
•
•
•
•
•
•
Serial I/O
Thrust Controller
Simulator
Fuel Out Indicator
Altitude and Velocity Display
Integrated System
hperry
5/4/01
Section 4 - Requirements Traceability
• Identifies the method used for verification of the requirement:
– Analysis
– Inspection
– Demonstration
– Test
• Maps each software requirement to a test
hperry
5/4/01
MIT 16.070 Requirements Matrix for Final Project
Rqmt #
1
2
3
4
5
6
7
8
9
10
11
hperry
5/4/01
Description
The system shall be composed of a controller (implemented on a handyboard), a
workstation simulation of the Mars Lander, a graphics package (which shall be
supplied to the developer) and a serial interface (cable and software) to connect
the controller to the workstation.
The Mars Lander simulation shall begin to execute on the workstation.
The simulation ends when the Mars Lander softly comes to rest on the surface
of Mars, when it crashes into the planet, or when it rises above its initial
altitude.
Method
Inspection
When the spacecraft is allowed to free fall (no thrust), it shall crash into the
Mars surface in approximately 16 seconds.
With thrust, the user should be able to land the spacecraft softly on Mars.
Test
Demo
Test
Demo
Test
Test
Demo
The controller shall operate as the user’s interface to the Mars Lander Vehicle.
Upon receipt of a data packet (1 byte) from the simulation, the controller shall:
1. Display vehicle altitude and vehicle velocity (both plus & minus) on the
handyboard LCD.
Upon receipt of a data packet (1 byte) from the simulation, the controller shall:
2. Light the LEDs if the “Fuel Out” bit is on.
Upon receipt of a data packet (1 byte) from the simulation, the controller shall:
3. Format a data packet and send it to the simulation.
This data packet shall contain a “thrust on/off” indication. When the start
button is depressed, the controller shall indicate a “thrust on” condition to the
simulation. This condition will remain “on” until the stop button has been
depressed (even through subsequent transmissions of the data packet to the
simulation). When the stop button is depressed, the controller shall indicate a
“thrust off” condition to the simulation. This condition shall remain until the
start button is pressed.
The simulation will track fuel remaining and send a flag to the controller
identifying when the fuel is out. If the vehicle has run out of fuel, the
simulation will not process any more thrust commands from the controller.
Test
Test
Demo
Test
Demo
Test
Test
Test
Test (if applicable)
n/a
Section 4 - Requirements Traceability (continued)
Rqmt #
12
13
14
15
16
17
17
18
Description
The simulator will have three types of output. It will:
1. send information to the controller specifying altitude, velocity, and whether
there is fuel remaining;
The simulator will have three types of output. It will:
2. call a graphics package (provided) that will display a representation of the
lander as it descends. The simulator must determine the condition of the vehicle
using one of four conditions defined in the Graphics Package Interface section
(see "vehicle-condition" variable);
The simulator will have three types of output. It will:
3. write altitude, velocity, thrust and fuel remaining information to a telemetry
file using the same format as the simulation in PS8 (plus a fuel remaining
column), and write an error message to the telemetry file if the graphics package
returns an error condition.
The simulation will end when the vehicle reaches 0 meters of altitude or goes
above its initial altitude.
Simulation code will interface to the provided graphics package
The serial interface should be used to continually send data back and forth
between the simulation and the controller.
Data sent from the simulation to the controller shall correspond to the provided
interface specification
Data sent from the controller to the simulation shall correspond to the provided
interface specification
hperry
5/4/01
Method
Test
Test
Test
Test
(see rqmt 3)
Test
Demo
Test
Test
Test
Test (if applicable)
Section 5 - Test Results
• Includes an overall assessment of the software as demonstrated by the
test results
• Identifies any remaining deficiencies, limitations or constraints that
were detected by the tests
• Includes recommended improvements in the design, operation or
testing of the software
• Includes a writeup of each test’s results
hperry
5/4/01
Section 6 - Notes
• Any general information that aids in the understanding of the test
report
• May include
– a list of abbreviations or acronyms
– definitions
– background information
– test rationale
– etc.
hperry
5/4/01
Real Time System Testing - Summary
• Over the last 3 lectures, you have learned that test and integration is an
important part of developing a real time system.
Why?
A Real Time System must continue running in the presence of failure
Therefore,
We must design in fault tolerance
We must find as many errors as possible before the system goes into use
We must be methodical about how we test and document what we learn
hperry
5/4/01