Presentation

Directions in Formal
Verification of Software
Ishai Rabinovitz
Verification Technologies
IBM Haifa Labs
Outline
!
!
!
!
What are formal verification and
model checking?
Why is formal verification for
software so hard?
Some basic techniques for
software model checking
The work here at IBM
Verification
!
Two main approaches to
automatically find bugs in
software and hardware
Testing (simulation)
" Formal verification
"
Testing
!
!
!
Run on some inputs and examine the
results
Can measure some kind of coverage
Advantages:
"
"
!
Relatively easy
Checks many aspects of the tested run
(control as well as data)
Disadvantages
"
Cannot prove correctness (falsification only)
Formal verification
!
!
Checks all possible runs
Advantages:
"
!
Verification of the specification is
possible (not only falsification)
Disadvantages
"
"
"
Hard
Not always feasible
Good for control checking (not
data)
Formal verification techniques
!
!
Theorem proving
Model Checking
Explicit model checking
" Symbolic model checking
"
Model checking
!
!
Build a model
A model can be represented as a graph
"
"
"
Each vertex is a state of the system – value to
all the variables (registers)
Each edge is a valid transition from state to
another state
Has initial states
Model checking (2)
!
We can check specifications
like:
"
Always i<=j
"
The program will always end
"
After a REQ there will be an ACK
Formal verification of Hardware
!
!
!
!
Success story
Widely used in the industry
Highly qualified users are
needed
Several successful techniques:
manual: divide and conquer,
restrictions
" automatic: abstraction refinement,
and more
"
Verifying software is
harder !
Why?
Theoretical problems
!
!
!
Software is undecidable. (Does
the program end?)
Software is unbounded (stack,
dynamically allocated memory)
Even if we restrict ourselves to
finite implementations (the
computer’s memory is bounded)
it is hard
Even finite software is hard
!
Programming languages have complicated
semantics (hard to model):
"
"
"
!
Functions
Recursion
Pointers
Hardware techniques do not transfer:
"
Data manipulation (vs. control)
!
!
"
The control path is integrated with the data path
It is hard to express data relationship in symbolic
model checker (huge BDDs)
Less modularity
!
It is difficult to use divide and conquer techniques
Harder type of parallelism
While hardware designs use
massive parallelism, a common
clock is usually implemented
(synchronous systems)
! In software no common clock is
available (asynchronous systems)
! Asynchronous systems present more
behaviors. – generates much bigger
models
(an example will be given later)
!
Economic problems
!
Bugs in HW are:
"
"
!
Bugs in SW are:
"
"
!
!
Not acceptable by the users
Very expensive to repair
Tolerated by the user
Relatively cheap to fix
HW vendors are willing to invest
large resources (time, money and
expert personnel) in verifying HW
SW companies are not
So why use formal verification on
software?
!
Parallel programs
"
"
"
"
!
Micro-code, smart-cards etc
"
"
!
Hard to test
Poor coverage
Programmers have less intuition
SW companies are willing to invest in skilled
personnel
Closer to hardware (in size and features)
Bugs are expensive to fix
Critical software (intensive care systems,
finance, security, anti-missiles systems
etc.)
Simpler user interface is
needed
!
!
Write a specification in a simple
way. (not all programmer
familiar with temporal logic)
Presenting the bug (counter
example) in the program terms
Techniques
Techniques
!
!
!
!
!
Modeling a program
Boolean programs (Microsoft’s
SLAM)
Abstraction refinement
Parallel oriented model checker.
(Lucent's VeriSoft, Bell Labs'
SPIN)
Framework (Kansas University’s
Bandera)
Modeling a Program
Modeling a program
!
!
!
Model (like hardware) is
synchronous – all variables
change at once
Software is sequential – one
change at a time
How can we translate a program
to a model?
Example
i, j
bar () {
int i=0, j=0;
while (i<2) {
i++;
j=i%2;
}
i=1;
}
0,0
1,0
2,0
1,1
2,1
0,1
Example (Using pc)
pc, i, j
bar () {
int i=0, j=0;
1 while (i<2) {
i++;
2
j=i%2;
3
}
4 i=1;
1,0,0
4,2,0
2,0,0
5,2,1
1,2,0
3,1,0
3,2,1
}5
1,1,1
2,1,1
Example (Using pc)
pc, i, j
next (i) =case
pc=2 : i+1
pc=4 : 1
else : i
next (j) =case
pc=3 : i%2
else j
next (pc) = case
pc=1 : if i<2 then 2
else 4
pc=2 : 3
pc=3 : 1
pc=4 : 5
bar () {
int i=0, j=0;
1 while (i<2) {
i++;
2
j=i%2;
3
}
4 i=1;
1,0,0
4,2,0
2,0,0
5,2,1
1,2,0
3,1,0
3,2,1
}5
1,1,1
2,1,1
Boolean programs
Microsoft’s SLAM
Boolean programs
!
!
!
If we wanted to manually verify
a program, we wouldn’t try to
explore all of its states or run on
all the inputs
We would set some invariants
and prove that they are kept
throughout the program run
Microsoft’s SLAM tries to do the
same
Example – Lock mechanism
Lock {
if (LOCK==1) error;
else LOCK = 1;
}
Unlock {
if (LOCK==0) error;
else LOCK = 0;
}
Unlock
Lock
Lock==0
Lock==0
Unlock
Lock
Error
Example - Program
Does this code
obey the
locking rule?
do {
Lock();
nPacketsOld = nPackets;
if(request){
request = request->Next;
Unlock();
nPackets++;
}
} while (nPackets != nPacketsOld);
Unlock();
Example – Boolean program
do {
Lock();
U
L
L
if(*){
L
Unlock();
U
}
U } while (*);
L
L
U
U
E
Unlock();
Model checking
boolean program
(bebop)
Example – Reconstruction
do {
Lock();
U
Is error path feasible
in C program?
(newton)
L
nPacketsOld = nPackets;
nPackets = C
nPacketsOld = C
L
L
U
U
U
E
if(request){
request = request->Next;
Unlock();
nPackets++;
nPackets = C+1
}
} while (nPackets != nPacketsOld);
Unlock();
Assume: C+1 == C
Example – Refinement
Add new predicate
b : (nPacketsOld == nPackets) to boolean program
(c2bp)
do {
Lock();
U
L
nPacketsOld = nPackets; b := true;
L
L
U
L
U
L
U
U
E
if(request){
request = request->Next;
Unlock();
nPackets++; b := b? false : *;
}
} while (nPackets != nPacketsOld);// !b
Unlock();
Example
b : (nPacketsOld == nPackets)
do {
Lock();
U
L
b := true;
b L
if(*){
b L
b U
b L
!b U
b L
U
b U
E
Unlock();
b := b? false : *;
}
} while ( !b );
Unlock();
Model checking
refined
boolean program
(bebop)
Example
b : (nPacketsOld == nPackets)
do {
Lock();
U
L
b := true;
b L
if(*){
b L
b U
b L
b L
b U
!b U
Unlock();
b := b? false : *;
}
} while ( !b );
Unlock();
Model checking
refined
boolean program
(bebop)
Abstraction refinement
failed
passed
Model
checking
reconstruct
abstraction
model
abstract
model
refine
Parallel oriented
Bell Labs' SPIN, Lucent's VeriSoft
Parallel oriented (VeriSoft)
!
!
!
Parallel programs force us to
encounter all possible
interleavings – generates large
models
One of the common heuristics to
reduce the model is partial-order
reductions
Mainly useful for explicit model
checking
interleaving
a1,b1
Global vars : G,Z
a2,b1
a1: x=G
a1,b2
a2: x=0
a3,b1
a2,b2
a1,b3
a3: y=1
a4: Z=2
a4,b1
a3,b2
a2,b3
a1,b4
a5:
a5,b1
a4,b2
a3,b3
a2,b4
a1,b5
b1: p=G
b2: p=0
a5,b2
a4,b3
a3,b4
a2,b5
b3: q=1
a5,b3
b4: Z=3
b5:
a4,b4
a5,b4
a5,b5
Z=3
a3,b3
a4,b5
a5,b5
Z=2
Visible
instructions
a1,b1
a2,b1
a1: x=G
a1,b2
a2: x=0
a3,b1
a1,b3
a2,b2
a3: y=1
a4: Z=2
a4,b1
a3,b2
a1,b4
a2,b3
a5:
a5,b1
a4,b2
a2,b4
a3,b3
a1,b5
b1: p=G
b2: p=0
a4,b3
a5,b2
a3,b4
a2,b5
b3: q=1
a4,b4
a5,b3
b4: Z=3
b5:
a5,b4
a5,b5
Z=3
a3,b3
a4,b5
a5,b5
Z=2
Partial Order
Reduction
a1,b1
a1,b2
a2,b1
a1: x=G
a2: x=0
a3,b1
a1,b3
a2,b2
a3: y=1
a4: Z=2
a4,b1
a3,b2
a1,b4
a2,b3
a5:
a5,b1
a4,b2
a2,b4
a3,b3
a1,b5
b1: p=G
b2: p=0
a5,b2
a3,b4
a4,b3
a2,b5
b3: q=1
a4,b4
a5,b3
b4: Z=3
b5:
a5,b4
a5,b5
Z=3
a3,b3
a4,b5
a5,b5
Z=2
Framework
Kansas University Bandera-like
Framework
Program
specification
Front
end
Reductions
IC after
reductions
Abstraction
Abstraction
refinement
Intermediate
Code (IC)
Intermediate
model
debugger
Translator
Model
Model
checker
Here at IBM
!
!
!
Using the power of RuleBase
Translate C to EDL
Support
"
"
!
Function + recursion
Pointers (no pointer arithmetic)
Automatic specifications:
"
"
"
"
"
No infinite loops
No assert violations
No memory leaks
No access to dangling pointers
No out of bound access to arrays