Derivation And Evaluation of Concurrent Collectors Martin T. Vechev University of Cambridge David F. Bacon, Perry Cheng and Dave Grove IBM T.J. Watson Research Center Outline Motivation and Benefits New Generalizations Abstract Algorithms Practical Algorithms Derivations Evaluation Motivation Concurrent Collectors: Difficult to Construct Correctly Initial Errors in Dijkstra and Steele Algorithms Difficult to Understand Difficult to Implement No systematic comparisons, largely folklore Contributions Generalization Of Existing Mechanisms Abstract Collectors Based on Generalizations Precise, but inefficient New Algorithm Derived from the power of generalizations Experimental Evaluation Of 4 Concurrent GC Benefits of Generalization Steele Completion Time Dijkstra Hybrid Yuasa Memory Overhead (Floating Garbage) Assumptions Single Collector Thread Multiple Mutator Threads Atomic Write Barrier Non-Moving Why Is It Hard ? Concurrent Interleaving B D C A R1 Time GC marks B Why Is It Hard ? B D B C A R2 D C R1 A R1 Time GC marks B Mutator creates R2 Why Is It Hard ? B D B C A R2 D B C R1 A R1 R2 D C A X R1 Time GC marks B Mutator creates R2 Mutator removes R1 Why Is It Hard ? B D B C A R2 D B C R1 A R1 R2 D B C A X R1 R2 D C A Time GC marks B Mutator creates R2 Mutator removes R1 GC reclaims C & D live – WRONG! Wavefront G E B R1 C F A R2 D Protection Installation Deletion Remember R2 Remember Crossing Pointer R1 G E B R1 C F A R2 D G E B R1 C F A X R2 D New generalizations Precise Wavefront Shade Precise Counting Of Cross Pointers Scanned Reference Count (S-RC) Collector Progress (Shade) G E B F D C R2 A SHADE:0 Collector Progress (Shade) G E B F D C R2 A SHADE:1 Collector Progress (Shade) G E B F D C R2 A SHADE:2 Collector Progress (Shade) G E B F D C R2 A SHADE:3 Shade Observations Computed by Collector Generalization of the tri-color abstraction Different Granularities Different Objects Scanned Reference Count (S-RC) G E B D C F A S-RC:0 Scanned Reference Count (S-RC) G E B D C F A S-RC:1 Scanned Reference Count (S-RC) G E B D C F A S-RC:2 Scanned Reference Count (S-RC) G E B D C FX A S-RC:1 Scanned Reference Count (S-RC) G E B C F X D A S-RC:0 Scanned Reference Count (S-RC) G E B D C F A S-RC:0 Outline Motivation and Benefits New Generalizations Abstract Algorithms Practical Algorithms Derivations Evaluation Abstract Algorithms Utilize Shade and S-RC Installation-Based and Deletion-Based Mutator nominates candidates Does not mark objects Concurrent System Structure COLLECT() do mark(); processNominated(); while (!finished); MUTATE (obj, field, target) obj.field = target; nominate(target); Mutator Nominates (Installation) G E B D C F A S-RC:0 NOMINATED OBJECT BUFFER Mutator Nominates (Installation) G E B D C F A S-RC:1 A NOMINATED OBJECT BUFFER Mutator Nominates (Installation) G E B D C S-RC:1 F A S-RC:1 A C NOMINATED OBJECT BUFFER Mutator Nominates (Installation) G E B C S-RC:1 F X D A S-RC:0 A C NOMINATED OBJECT BUFFER After Mark (Installation) COLLECT() do mark(); processNominated(); while (!finished); G E B D C S-RC:1 F A S-RC:0 A C NOMINATED OBJECT BUFFER After Find (Installation) Collector Marks Object C COLLECT() G E B C S-RC:1 F do mark(); processNominated(); while (!finished); D A S-RC:0 A C NOMINATED OBJECT BUFFER Allocation In Installation-Based Collectors No difference In Deletion-Based Collectors Remembered Upon Allocation Allocation G E B D C F A NOMINATED OBJECT BUFFER Allocation E G B D C F A N S-RC:1 N NOMINATED OBJECT BUFFER Outline Motivation and Benefits New Generalizations Abstract Algorithms Practical Algorithms Derivations Evaluation Practical Algorithms Stacks Non-Barriered Region Scanned Object : behind wavefront S-RC affected Stack rescanning S-RC and Shade compression (tri-color) Reachability Effect Compressing S-RC (sticky bit) G E B D C F A S-RC:0 Compressing S-RC (sticky bit) G E B D C F A S-RC:1 Compressing S-RC (sticky bit) G E B D C FX A S-RC:1 Compressing S-RC (sticky bit) G E B D C F A S-RC:1 Object A – Unreachable but kept Alive Compressing Shade G E B D C F A SHADE:0 Collector Progress (Shade) G E B D C F A SHADE:3 Collector Progress (Shade) G E B D C F S-RC:1 R1 A SHADE:3 PRECISE : 1 Collector Progress (Shade) G E B D C F S-RC:1 (NOT decremented) XR1 A SHADE:3 PRECISE : 1 Collector Progress (Shade) G E B D C F A SHADE:3 PRECISE : 1 Object C – Unreachable but kept Alive Outline Motivation and Benefits New Generalizations Abstract Algorithms Practical Algorithms Derivations Evaluation Deriving Dijkstra INSTALLATION-BASED GC Stack Regions RESCANNED STACKS Compress S-RC to sticky bit for ALL objects INSTALLATION with 1-bit Compress Shade to 1-bit for ALL objects DIJKSTRA Deriving Yuasa DELETION-BASED GC Stack Regions RESCANNED STACKS Compress S-RC to 0-bit for ALL objects DELETION with 0-bits NO S-RC needed => NO rescanning Deletion with NO rescanning Compress Shade to 1-bit for ALL objects YUASA Deriving a New Collector (Hybrid) DELETION-BASED GC Stack Regions RESCANNED STACKS Compress S-RC to sticky bit for Allocated objects Compress S-RC to 0-bit for Existing objects MIXED DELETION Compress Shade to 1-bit for ALL objects HYBRID Algorithms Steele Completion Time Dijkstra Hybrid Yuasa Memory Overhead (Floating Garbage) Evaluation First Systematic Comparison Of Concurrent Collectors IBM J9 Production Virtual Machine J2ME Profile, microJit 512MB RAM, Pentium 4, 3GHz Comparison in terms of Execution time And Space Overhead Dijkstra, Steele, Yuasa, Hybrid Which benchmarks: SpecJVM98 (-s100) Work-Based Incremental Scheme Collect 9K for every 6K allocated. Maximum Space Usage 1.6 Maximum Space 1.4 YUASA DIJKSTRA STEELE HYBRID 1.2 1 0.8 0.6 0.4 0.2 0 jess db javac mtrt Benchmarks jack geomean Execution Time 1.4 End-to-End Time 1.2 YUASA DIJKSTRA STEELE HYBRID 1 0.8 0.6 0.4 0.2 0 jess db javac mtrt Benchmarks jack geomean Summary Generalization Of Existing Mechanisms S-RC, Shade Abstract Collectors Based on Generalizations Precise, but inefficient (S-RC, Shade) New Concrete Algorithm Combines good properties of Yuasa and Dijkstra Suitable for Real-Time Domains Experimental Evaluation On-Going Work More Transformations Formal Proof Of Correctness Transformations Unified Abstract Collector Formal Relation Between Algorithms IF TIME PERMITS SLIDES Abstract Object Layout Computed By Mutator Barrier Reference Count Don’t Sweep Recorded Computed By Collector Marked Shade DATA The Transitive Loss P2 B C A D P1 B C A P2 D B P1 C P2 D B A C D A Time Collector Working - Marks B as live Thread Working Thread Working Collector Working - Installs P2 - Deletes P1 - C becomes unreachable - C was not seen - Reclaims C is OK - Reclaims D: live! Common Structure Example WriteBarrier(Obj, field, New) { if (Phase == Tracing) { Old = Obj[field]; Remember(Old); } Obj[field] = New; } YUASA WriteBarrier(Obj, field, New) { if (Phase == Tracing) { Remember(New); } Obj[field] = New; } DIJKSTRA