An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot K. Kolodner - IBM Haifa Research Lab 1 Garbage Collection Manual de-allocation may cause notorious bugs (memory leaks, dangling pointers). Garbage collection (GC): automatic recycling of dynamically allocated memory. Garbage: objects that are not live, but are not free either. 2 Reference Counting Each object has an rc field. New objects get o.rc:=1. When p that points to o1 is modified to point to o2 we do: o1.rc--, o2.rc++. if o1.rc==0: p o1 o2 Decrement rc for all sons of o1. Recursively delete objects whose rc is decremented to 0. Delete o1. 3 Three Main Drawbacks of RC Drastic improvement by High overhead Levanoni-Petrank 2001 Costly parallelism Inability to reclaim cycles This work 4 Cyclic Structures Reclamation Problem A garbage cycle denotes a strongly connected component in the objects graph which is unreachable from the program roots. 1 1 2 1 a b p 21 c Garbage cycle 5 Collecting Garbage Cycles in Reference Counting Systems Reference counting collectors employ one of 2 avenues to collect garbage cycles: A backup tracing collector. A cycle collector. This work proposes a new concurrent cycle collection. Contributions: More efficient than previous concurrent cycle collector. Solves termination problem. First throughput comparison between cycle collection and a backup tracing collector. 6 Cycle Collection Basic Idea - 1 ⇒ Observation 1: Garbage cycles can only be created when a rc is decremented to a non-zero value. Objects whose rc is decremented to a non-zero value become candidates. 7 Cycle Collection Basic Idea - 2 Terms: ⇒ Sub-graph of O: graph of objects reachable from O. External pointer (to a sub-graph): a pointer from a non sub-graph object to a sub-graph object. Internal pointer (of a sub-graph): a pointer between 2 sub-graph objects. a o o1 o3 o2 o4 o5 Observation 2: In a garbage cycle all the reference counts are due to internal pointer of the cycle. For each candidate’s sub-graph, check if external pointers point to this sub-graph. 8 Goal: Compute Counts for External Pointers Only r r 1 1 2 1 1 a c r 0 a b b b 1 1 0 2 2 d c 2 a garbage cycle a 2 edge r->a deleted d c 0 1 d rc’s when ignoring internal edges Not a garbage cycle 9 Implementing the Cycle Collector Idea ⇒ Object is colored black/ gray/ white. Whenever a rc of an object ‘a’ is decremented to a non-zero value, perform 3 local traversals over the graph of objects of ‘a’. r 1 210 a b Mark: Updates rc’s to reflect only 10 pointers that are external to the graph of ‘a’, marking nodes in gray. c d Scan: Restores rc’s of the externally 21 210 reachable objects, coloring them in black. Rest of the nodes are marked as garbage (white). Collect: collects garbage objects (the white ones). 10 Concurrent Cycle Collection A concurrent collector: a collector that runs concurrently with the program threads. Concurrent program GC Concurrent cycle collection is more complex: objects graph may be modified while the collector scans it. ⇒ Cannot rely on repeated traversals of a graph to read the same set of nodes and edges. Using the algorithm above may produce incorrect results. 11 Safety Problem - Example ⇒ A mutator deletes the edge c->d, between the MarkGray and Scan procedures. The Scan phase incorrectly infers live objects (a & b) to be garbage. 12 c 01 a 21 d 21 e 012 b 12 Confronting Drawbacks in Previous Work Previous concurrent cycle collector by Bacon & Rajan added overhead to achieve safety in light of inconsistent view of the heap. Overhead reduces efficiency. Completeness could not be achieved. Our solution: use a fixed-view of the heap! Multiple heap traces consider the same graph each time. 13 Getting a Snapshot of the Heap A snapshot of the heap could be taken by a concurrent collector. Levanoni-Petrank’s snapshot: Copy-on-first-write mechanism: for each pointer modified for the first time after a collection: Save its snapshot value in a buffer. Mark the pointer “dirty” (no need to be logged again). The cycle collector traverses each object according to the pointers’ values as existed in the snapshot time. Concurrent program GC 14 Cycle Collection on Heap’s Snapshot The standard (non-concurrent) cycle collection correctly identifies garbage cycles on a snapshot. All garbage cycles are collected. It is not disturbed by mutator activity. A garbage cycle created, must exist in next snapshot. Only garbage cycles are collected. A non reachable cycle in the snapshot is indeed a garbage cycle. Concurrent program GC 15 Levanoni-Petrank’s RC Consider a pointer p that takes the following values between GC’s: O0,O1, O2, …, On . p O0 O1 O2 O3 O4 . . . . . On All RC algorithms perform 2n operations: O0.rc--; O1.rc++; O1.rc--; O2.rc++; O2.rc--; … ; On.rc++; But only 2 operations are needed:O0.rc--,On.rc++ 16 Less Cycle Candidates Previous algorithms: object whose rc is decremented to a non-zero value is considered as a candidate. The Levanoni-Petrank’s write-barrier does not p log most of the decrements. Does it “miss” cycles? O0 O1 O2 O3 O4 . . . . . On The new cycle collection algorithm collects all cycles, although performing less work. 17 More in the Paper Reducing pauses further by stopping each thread separately (instead of all together). Care with new races… Concurrent program GC On-the-Fly More techniques to reduce the number of traced objects. 18 Measurements Implemented in Jikes with two collectors: The sliding-views reference-counting collector. The age-oriented collector: Uses mark and sweep for the young generation and reference counting for the old generation. Measurements: Throughput comparison between cycle collection and a backup tracing collector. Characteristic comparison to the previous on-thefly cycle collector. 19 Work Reduction Work ratio compared to Bacon & Rajan candidates handled objects traced 1.4 1 0.8 0.6 0.4 0.2 jb b ck ja trt m ja va c db ss je m pr es s 0 co Work ratio 1.2 20 Work Reduction with the Age-Oriented Collector Work ratio between RC and age oriented candidates handled objects traced 1.4 1 0.8 0.6 0.4 0.2 jb b ck ja trt m ja va c db ss je m pr es s 0 co Work ratio 1.2 21 Throughput Comparison of CycleCollection with Backup Tracing Throughput ratio: cycle collection/backup tracing SPECjbb2000 with 4-8 warehouses - Reference Counting 1.15 1.1 4 warehouses 1.05 5 warehouses 1 6 warehouses 0.95 7 warehouses 8 warehouses 0.9 0.85 256 320 384 448 512 576 640 704 Heap size 22 Throughput Comparison of CycleCollection with Backup Tracing SPECjbb2000 with 4-8 warehouses - Age Oriented Throughput ratio: cycle collection/backup tracing 1.15 1.1 4 warehouses 1.05 5 warehouses 1 6 warehouses 7 warehouses 0.95 8 warehouses 0.9 0.85 256 320 384 448 512 576 640 704 Heap size 23 Related Work Cycle collection: Cyclic reference counting with local mark-scan. Martinez, Wachenchauzer and Lins [1990]. Cyclic reference counting with lazy mark-scan. Lins [1992]. Concurrent cycle collection in reference counted systems. Bacon and Rajan [2001]. Other: An on-the-fly reference counting garbage collector for Java. Levanoni and Petrank [2001]. Age-Oriented Concurrent Garbage Collection. Paz, Petrank, and Blackburn [2005]. 24 Conclusions Cycle collection may be efficiently executed on-the-fly by using Levanoni-Petrank’s RC with the efficient standard cycle collector. Today’s benchmarks: Reference counting for full heap: prefer a backup tracing. Reference counting for old generation: slight preference to cycle collection. Eyes for the future: with large heaps cycle collection may outperform backup tracing. 25