Derivation And Evaluation
of Concurrent Collectors
Martin T. Vechev
University of Cambridge
David F. Bacon, Perry Cheng and Dave Grove
IBM T.J. Watson Research Center
Outline
Motivation and Benefits
New Generalizations
Abstract Algorithms
Practical Algorithms
Derivations
Evaluation
Motivation
Concurrent Collectors:
Difficult to Construct Correctly
Initial Errors in Dijkstra and Steele Algorithms
Difficult to Understand
Difficult to Implement
No systematic comparisons, largely folklore
Contributions
Generalization Of Existing Mechanisms
Abstract Collectors Based on Generalizations
Precise, but inefficient
New Algorithm
Derived from the power of generalizations
Experimental Evaluation Of 4 Concurrent GC
Benefits of Generalization
Steele
Completion Time
Dijkstra
Hybrid
Yuasa
Memory Overhead (Floating Garbage)
Assumptions
Single Collector Thread
Multiple Mutator Threads
Atomic Write Barrier
Non-Moving
Why Is It Hard ?
Concurrent Interleaving
B
D
C
A
R1
Time
GC marks B
Why Is It Hard ?
B
D
B
C
A
R2
D
C
R1
A
R1
Time
GC marks B
Mutator creates
R2
Why Is It Hard ?
B
D
B
C
A
R2
D
B
C
R1
A
R1
R2
D
C
A
X R1
Time
GC marks B
Mutator creates
R2
Mutator removes
R1
Why Is It Hard ?
B
D
B
C
A
R2
D
B
C
R1
A
R1
R2
D
B
C
A
X R1
R2
D
C
A
Time
GC marks B
Mutator creates
R2
Mutator removes
R1
GC reclaims C & D
live – WRONG!
Wavefront
G
E
B
R1
C
F
A
R2
D
Protection
Installation
Deletion
Remember R2
Remember Crossing Pointer R1
G
E
B
R1
C
F
A
R2
D
G
E
B
R1
C
F
A
X
R2
D
New generalizations
Precise Wavefront
Shade
Precise Counting Of Cross Pointers
Scanned Reference Count (S-RC)
Collector Progress (Shade)
G
E
B
F
D
C
R2
A
SHADE:0
Collector Progress (Shade)
G
E
B
F
D
C
R2
A
SHADE:1
Collector Progress (Shade)
G
E
B
F
D
C
R2
A
SHADE:2
Collector Progress (Shade)
G
E
B
F
D
C
R2
A
SHADE:3
Shade Observations
Computed by Collector
Generalization of the tri-color abstraction
Different Granularities
Different Objects
Scanned Reference Count (S-RC)
G
E
B
D
C
F
A
S-RC:0
Scanned Reference Count (S-RC)
G
E
B
D
C
F
A
S-RC:1
Scanned Reference Count (S-RC)
G
E
B
D
C
F
A
S-RC:2
Scanned Reference Count (S-RC)
G
E
B
D
C
FX
A
S-RC:1
Scanned Reference Count (S-RC)
G
E
B
C
F
X
D
A
S-RC:0
Scanned Reference Count (S-RC)
G
E
B
D
C
F
A
S-RC:0
Outline
Motivation and Benefits
New Generalizations
Abstract Algorithms
Practical Algorithms
Derivations
Evaluation
Abstract Algorithms
Utilize Shade and S-RC
Installation-Based and Deletion-Based
Mutator nominates candidates
Does not mark objects
Concurrent System Structure
COLLECT()
do
mark();
processNominated();
while (!finished);
MUTATE (obj, field, target)
obj.field = target;
nominate(target);
Mutator Nominates (Installation)
G
E
B
D
C
F
A
S-RC:0
NOMINATED OBJECT BUFFER
Mutator Nominates (Installation)
G
E
B
D
C
F
A
S-RC:1
A
NOMINATED OBJECT BUFFER
Mutator Nominates (Installation)
G
E
B
D
C S-RC:1
F
A
S-RC:1
A C
NOMINATED OBJECT BUFFER
Mutator Nominates (Installation)
G
E
B
C S-RC:1
F
X
D
A
S-RC:0
A C
NOMINATED OBJECT BUFFER
After Mark (Installation)
COLLECT()
do
mark();
processNominated();
while (!finished);
G
E
B
D
C S-RC:1
F
A
S-RC:0
A C
NOMINATED OBJECT BUFFER
After Find (Installation)
Collector Marks Object C
COLLECT()
G
E
B
C S-RC:1
F
do
mark();
processNominated();
while (!finished);
D
A
S-RC:0
A C
NOMINATED OBJECT BUFFER
Allocation
In Installation-Based Collectors
No difference
In Deletion-Based Collectors
Remembered Upon Allocation
Allocation
G
E
B
D
C
F
A
NOMINATED OBJECT BUFFER
Allocation
E
G
B
D
C
F
A
N
S-RC:1
N
NOMINATED OBJECT BUFFER
Outline
Motivation and Benefits
New Generalizations
Abstract Algorithms
Practical Algorithms
Derivations
Evaluation
Practical Algorithms
Stacks
Non-Barriered Region
Scanned Object : behind wavefront
S-RC affected
Stack rescanning
S-RC and Shade compression (tri-color)
Reachability Effect
Compressing S-RC (sticky bit)
G
E
B
D
C
F
A
S-RC:0
Compressing S-RC (sticky bit)
G
E
B
D
C
F
A
S-RC:1
Compressing S-RC (sticky bit)
G
E
B
D
C
FX
A
S-RC:1
Compressing S-RC (sticky bit)
G
E
B
D
C
F
A
S-RC:1
Object A – Unreachable but kept Alive
Compressing Shade
G
E
B
D
C
F
A
SHADE:0
Collector Progress (Shade)
G
E
B
D
C
F
A
SHADE:3
Collector Progress (Shade)
G
E
B
D
C
F
S-RC:1
R1
A
SHADE:3
PRECISE : 1
Collector Progress (Shade)
G
E
B
D
C
F
S-RC:1 (NOT decremented)
XR1
A
SHADE:3
PRECISE : 1
Collector Progress (Shade)
G
E
B
D
C
F
A
SHADE:3
PRECISE : 1
Object C – Unreachable but kept Alive
Outline
Motivation and Benefits
New Generalizations
Abstract Algorithms
Practical Algorithms
Derivations
Evaluation
Deriving Dijkstra
INSTALLATION-BASED GC
Stack Regions
RESCANNED STACKS
Compress S-RC to sticky bit for ALL objects
INSTALLATION with 1-bit
Compress Shade to 1-bit for ALL objects
DIJKSTRA
Deriving Yuasa
DELETION-BASED GC
Stack Regions
RESCANNED STACKS
Compress S-RC to 0-bit for ALL objects
DELETION with 0-bits
NO S-RC needed => NO rescanning
Deletion with NO rescanning
Compress Shade to 1-bit for ALL objects
YUASA
Deriving a New Collector (Hybrid)
DELETION-BASED GC
Stack Regions
RESCANNED STACKS
Compress S-RC to sticky bit for Allocated objects
Compress S-RC to 0-bit for Existing objects
MIXED DELETION
Compress Shade to 1-bit for ALL objects
HYBRID
Algorithms
Steele
Completion Time
Dijkstra
Hybrid
Yuasa
Memory Overhead (Floating Garbage)
Evaluation
First Systematic Comparison Of Concurrent Collectors
IBM J9 Production Virtual Machine
J2ME Profile, microJit
512MB RAM, Pentium 4, 3GHz
Comparison in terms of Execution time And Space Overhead
Dijkstra, Steele, Yuasa, Hybrid
Which benchmarks:
SpecJVM98 (-s100)
Work-Based Incremental Scheme
Collect 9K for every 6K allocated.
Maximum Space Usage
1.6
Maximum Space
1.4
YUASA
DIJKSTRA
STEELE
HYBRID
1.2
1
0.8
0.6
0.4
0.2
0
jess
db
javac
mtrt
Benchmarks
jack
geomean
Execution Time
1.4
End-to-End Time
1.2
YUASA
DIJKSTRA
STEELE
HYBRID
1
0.8
0.6
0.4
0.2
0
jess
db
javac
mtrt
Benchmarks
jack
geomean
Summary
Generalization Of Existing Mechanisms
S-RC, Shade
Abstract Collectors Based on Generalizations
Precise, but inefficient (S-RC, Shade)
New Concrete Algorithm
Combines good properties of Yuasa and Dijkstra
Suitable for Real-Time Domains
Experimental Evaluation
On-Going Work
More Transformations
Formal Proof Of Correctness
Transformations
Unified Abstract Collector
Formal Relation Between Algorithms
IF TIME PERMITS SLIDES
Abstract Object Layout
Computed By Mutator
Barrier Reference Count
Don’t Sweep
Recorded
Computed By Collector
Marked
Shade
DATA
The Transitive Loss
P2
B
C
A
D
P1
B
C
A
P2
D B
P1
C
P2
D B
A
C
D
A
Time
Collector Working
- Marks B as live
Thread Working Thread Working Collector Working
- Installs P2
- Deletes P1
- C becomes
unreachable
- C was not seen
- Reclaims C is
OK
- Reclaims D: live!
Common Structure Example
WriteBarrier(Obj, field, New)
{
if (Phase == Tracing)
{
Old = Obj[field];
Remember(Old);
}
Obj[field] = New;
}
YUASA
WriteBarrier(Obj, field, New)
{
if (Phase == Tracing)
{
Remember(New);
}
Obj[field] = New;
}
DIJKSTRA