slides

Derivation And Evaluation
of Concurrent Collectors
Martin T. Vechev
University of Cambridge
David F. Bacon, Perry Cheng and Dave Grove
IBM T.J. Watson Research Center
Outline
 Motivation and Benefits
 New Generalizations
 Abstract Algorithms
 Practical Algorithms
 Derivations
 Evaluation
Motivation
Concurrent Collectors:
 Difficult to Construct Correctly
 Initial Errors in Dijkstra and Steele Algorithms
 Difficult to Understand
 Difficult to Implement
 No systematic comparisons, largely folklore
Contributions
 Generalization Of Existing Mechanisms
 Abstract Collectors Based on Generalizations
 Precise, but inefficient
 New Algorithm
 Derived from the power of generalizations
 Experimental Evaluation Of 4 Concurrent GC
Benefits of Generalization
Steele
Completion Time
Dijkstra
Hybrid
Yuasa
Memory Overhead (Floating Garbage)
Assumptions
 Single Collector Thread
 Multiple Mutator Threads
 Atomic Write Barrier
 Non-Moving
Why Is It Hard ?
 Concurrent Interleaving
B
D
C
A
R1
Time
GC marks B
Why Is It Hard ?
B
D
B
C
A
R2
D
C
R1
A
R1
Time
GC marks B
Mutator creates
R2
Why Is It Hard ?
B
D
B
C
A
R2
D
B
C
R1
A
R1
R2
D
C
A
X R1
Time
GC marks B
Mutator creates
R2
Mutator removes
R1
Why Is It Hard ?
B
D
B
C
A
R2
D
B
C
R1
A
R1
R2
D
B
C
A
X R1
R2
D
C
A
Time
GC marks B
Mutator creates
R2
Mutator removes
R1
GC reclaims C & D
live – WRONG!
Wavefront
G
E
B
R1
C
F
A
R2
D
Protection
Installation
Deletion
Remember R2
Remember Crossing Pointer R1
G
E
B
R1
C
F
A
R2
D
G
E
B
R1
C
F
A
X
R2
D
New generalizations
 Precise Wavefront
 Shade
 Precise Counting Of Cross Pointers
 Scanned Reference Count (S-RC)
Collector Progress (Shade)
G
E
B
F
D
C
R2
A
SHADE:0
Collector Progress (Shade)
G
E
B
F
D
C
R2
A
SHADE:1
Collector Progress (Shade)
G
E
B
F
D
C
R2
A
SHADE:2
Collector Progress (Shade)
G
E
B
F
D
C
R2
A
SHADE:3
Shade Observations
 Computed by Collector
 Generalization of the tri-color abstraction
 Different Granularities
 Different Objects
Scanned Reference Count (S-RC)
G
E
B
D
C
F
A
S-RC:0
Scanned Reference Count (S-RC)
G
E
B
D
C
F
A
S-RC:1
Scanned Reference Count (S-RC)
G
E
B
D
C
F
A
S-RC:2
Scanned Reference Count (S-RC)
G
E
B
D
C
FX
A
S-RC:1
Scanned Reference Count (S-RC)
G
E
B
C
F
X
D
A
S-RC:0
Scanned Reference Count (S-RC)
G
E
B
D
C
F
A
S-RC:0
Outline
 Motivation and Benefits
 New Generalizations
 Abstract Algorithms
 Practical Algorithms
 Derivations
 Evaluation
Abstract Algorithms
 Utilize Shade and S-RC
 Installation-Based and Deletion-Based
 Mutator nominates candidates
 Does not mark objects
Concurrent System Structure
COLLECT()
do
mark();
processNominated();
while (!finished);
MUTATE (obj, field, target)
obj.field = target;
nominate(target);
Mutator Nominates (Installation)
G
E
B
D
C
F
A
S-RC:0
NOMINATED OBJECT BUFFER
Mutator Nominates (Installation)
G
E
B
D
C
F
A
S-RC:1
A
NOMINATED OBJECT BUFFER
Mutator Nominates (Installation)
G
E
B
D
C S-RC:1
F
A
S-RC:1
A C
NOMINATED OBJECT BUFFER
Mutator Nominates (Installation)
G
E
B
C S-RC:1
F
X
D
A
S-RC:0
A C
NOMINATED OBJECT BUFFER
After Mark (Installation)
COLLECT()
do
mark();
processNominated();
while (!finished);
G
E
B
D
C S-RC:1
F
A
S-RC:0
A C
NOMINATED OBJECT BUFFER
After Find (Installation)
 Collector Marks Object C
COLLECT()
G
E
B
C S-RC:1
F
do
mark();
processNominated();
while (!finished);
D
A
S-RC:0
A C
NOMINATED OBJECT BUFFER
Allocation
 In Installation-Based Collectors
 No difference
 In Deletion-Based Collectors
 Remembered Upon Allocation
Allocation
G
E
B
D
C
F
A
NOMINATED OBJECT BUFFER
Allocation
E
G
B
D
C
F
A
N
S-RC:1
N
NOMINATED OBJECT BUFFER
Outline
 Motivation and Benefits
 New Generalizations
 Abstract Algorithms
 Practical Algorithms
 Derivations
 Evaluation
Practical Algorithms
 Stacks
 Non-Barriered Region
 Scanned Object : behind wavefront
 S-RC affected
 Stack rescanning
 S-RC and Shade compression (tri-color)
 Reachability Effect
Compressing S-RC (sticky bit)
G
E
B
D
C
F
A
S-RC:0
Compressing S-RC (sticky bit)
G
E
B
D
C
F
A
S-RC:1
Compressing S-RC (sticky bit)
G
E
B
D
C
FX
A
S-RC:1
Compressing S-RC (sticky bit)
G
E
B
D
C
F
A
S-RC:1
Object A – Unreachable but kept Alive
Compressing Shade
G
E
B
D
C
F
A
SHADE:0
Collector Progress (Shade)
G
E
B
D
C
F
A
SHADE:3
Collector Progress (Shade)
G
E
B
D
C
F
S-RC:1
R1
A
SHADE:3
PRECISE : 1
Collector Progress (Shade)
G
E
B
D
C
F
S-RC:1 (NOT decremented)
XR1
A
SHADE:3
PRECISE : 1
Collector Progress (Shade)
G
E
B
D
C
F
A
SHADE:3
PRECISE : 1
Object C – Unreachable but kept Alive
Outline
 Motivation and Benefits
 New Generalizations
 Abstract Algorithms
 Practical Algorithms
 Derivations
 Evaluation
Deriving Dijkstra
INSTALLATION-BASED GC
Stack Regions
RESCANNED STACKS
Compress S-RC to sticky bit for ALL objects
INSTALLATION with 1-bit
Compress Shade to 1-bit for ALL objects
DIJKSTRA
Deriving Yuasa
DELETION-BASED GC
Stack Regions
RESCANNED STACKS
Compress S-RC to 0-bit for ALL objects
DELETION with 0-bits
NO S-RC needed => NO rescanning
Deletion with NO rescanning
Compress Shade to 1-bit for ALL objects
YUASA
Deriving a New Collector (Hybrid)
DELETION-BASED GC
Stack Regions
RESCANNED STACKS
Compress S-RC to sticky bit for Allocated objects
Compress S-RC to 0-bit for Existing objects
MIXED DELETION
Compress Shade to 1-bit for ALL objects
HYBRID
Algorithms
Steele
Completion Time
Dijkstra
Hybrid
Yuasa
Memory Overhead (Floating Garbage)
Evaluation
 First Systematic Comparison Of Concurrent Collectors
 IBM J9 Production Virtual Machine
 J2ME Profile, microJit
 512MB RAM, Pentium 4, 3GHz
 Comparison in terms of Execution time And Space Overhead
 Dijkstra, Steele, Yuasa, Hybrid
 Which benchmarks:
 SpecJVM98 (-s100)
 Work-Based Incremental Scheme
 Collect 9K for every 6K allocated.
Maximum Space Usage
1.6
Maximum Space
1.4
YUASA
DIJKSTRA
STEELE
HYBRID
1.2
1
0.8
0.6
0.4
0.2
0
jess
db
javac
mtrt
Benchmarks
jack
geomean
Execution Time
1.4
End-to-End Time
1.2
YUASA
DIJKSTRA
STEELE
HYBRID
1
0.8
0.6
0.4
0.2
0
jess
db
javac
mtrt
Benchmarks
jack
geomean
Summary
 Generalization Of Existing Mechanisms
 S-RC, Shade
 Abstract Collectors Based on Generalizations
 Precise, but inefficient (S-RC, Shade)
 New Concrete Algorithm
 Combines good properties of Yuasa and Dijkstra
 Suitable for Real-Time Domains
 Experimental Evaluation
On-Going Work
 More Transformations
 Formal Proof Of Correctness
 Transformations
 Unified Abstract Collector
 Formal Relation Between Algorithms
IF TIME PERMITS SLIDES
Abstract Object Layout
Computed By Mutator
Barrier Reference Count
Don’t Sweep
Recorded
Computed By Collector
Marked
Shade
DATA
The Transitive Loss
P2
B
C
A
D
P1
B
C
A
P2
D B
P1
C
P2
D B
A
C
D
A
Time
Collector Working
- Marks B as live
Thread Working Thread Working Collector Working
- Installs P2
- Deletes P1
- C becomes
unreachable
- C was not seen
- Reclaims C is
OK
- Reclaims D: live!
Common Structure Example
WriteBarrier(Obj, field, New)
{
if (Phase == Tracing)
{
Old = Obj[field];
Remember(Old);
}
Obj[field] = New;
}
YUASA
WriteBarrier(Obj, field, New)
{
if (Phase == Tracing)
{
Remember(New);
}
Obj[field] = New;
}
DIJKSTRA