slides

An Efficient On-the-Fly
Cycle Collection
Harel Paz, Erez Petrank - Technion, Israel
David F. Bacon, V. T. Rajan - IBM T.J.
Watson Research Center
Elliot K. Kolodner - IBM Haifa Research Lab
1
Garbage Collection


Manual de-allocation may cause notorious
bugs (memory leaks, dangling pointers).
Garbage collection (GC): automatic recycling
of dynamically allocated memory.

Garbage: objects that are not live, but are not free
either.
2
Reference Counting

Each object has an rc field.



New objects get o.rc:=1.
When p that points to o1
is modified to point to o2
we do: o1.rc--, o2.rc++.
if o1.rc==0:



p
o1
o2
Decrement rc for all sons of o1.
Recursively delete objects whose rc is
decremented to 0.
Delete o1.
3
Three Main Drawbacks of RC



Drastic improvement by
High overhead
Levanoni-Petrank 2001
Costly parallelism
Inability to reclaim cycles
This work
4
Cyclic Structures Reclamation
Problem

A garbage cycle denotes a strongly connected
component in the objects graph which is
unreachable from the program roots.
1
1
2
1
a
b
p
21
c
Garbage
cycle
5
Collecting Garbage Cycles in
Reference Counting Systems

Reference counting collectors employ one of 2 avenues
to collect garbage cycles:




A backup tracing collector.
A cycle collector.
This work proposes a new concurrent cycle collection.
Contributions:



More efficient than previous concurrent cycle collector.
Solves termination problem.
First throughput comparison between cycle collection and a
backup tracing collector.
6
Cycle Collection Basic Idea - 1

⇒
Observation 1: Garbage cycles can only be
created when a rc is decremented to a non-zero
value.
Objects whose rc is decremented to a non-zero
value become candidates.
7
Cycle Collection Basic Idea - 2

Terms:




⇒
Sub-graph of O: graph of objects
reachable from O.
External pointer (to a sub-graph): a
pointer from a non sub-graph object
to a sub-graph object.
Internal pointer (of a sub-graph): a
pointer between 2 sub-graph objects.
a
o
o1
o3
o2
o4
o5
Observation 2: In a garbage cycle all the reference
counts are due to internal pointer of the cycle.
For each candidate’s sub-graph, check if external
pointers point to this sub-graph.
8
Goal: Compute Counts for
External Pointers Only
r
r
1
1
2
1
1
a
c
r
0
a
b
b
b
1
1
0
2
2
d
c
2
a garbage
cycle
a
2
edge r->a
deleted
d
c
0
1
d
rc’s when
ignoring internal
edges
Not a
garbage
cycle
9
Implementing the Cycle Collector
Idea

⇒
Object is colored black/ gray/ white.
Whenever a rc of an object ‘a’ is
decremented to a non-zero value,
perform 3 local traversals over the
graph of objects of ‘a’.



r
1
210
a
b
Mark: Updates rc’s to reflect only
10
pointers that are external to the graph
of ‘a’, marking nodes in gray.
c
d
Scan: Restores rc’s of the externally
21
210
reachable objects, coloring them in
black. Rest of the nodes are marked as
garbage (white).
Collect: collects garbage objects (the white ones).
10
Concurrent Cycle Collection

A concurrent collector: a collector that runs
concurrently with the program threads.
Concurrent

program
GC
Concurrent cycle collection is more complex: objects
graph may be modified while the collector scans it.

⇒
Cannot rely on repeated traversals of a graph to read
the same set of nodes and edges.
Using the algorithm above may produce incorrect results.
11
Safety Problem - Example

⇒
A mutator deletes the edge
c->d, between the MarkGray
and Scan procedures.
The Scan phase incorrectly
infers live objects (a & b) to
be garbage.
12 c
01 a
21 d
21 e
012 b
12
Confronting Drawbacks in
Previous Work

Previous concurrent cycle collector by Bacon &
Rajan added overhead to achieve safety in light
of inconsistent view of the heap.



Overhead reduces efficiency.
Completeness could not be achieved.
Our solution: use a fixed-view of the heap!

Multiple heap traces consider the same graph each
time.
13
Getting a Snapshot of the Heap


A snapshot of the heap could be taken by a
concurrent collector.
Levanoni-Petrank’s snapshot:

Copy-on-first-write mechanism: for each pointer
modified for the first time after a collection:



Save its snapshot value in a buffer.
Mark the pointer “dirty” (no need to be logged again).
The cycle collector traverses each object
according to the pointers’ values as existed in
the snapshot time.
Concurrent
program
GC
14
Cycle Collection on Heap’s
Snapshot

The standard (non-concurrent) cycle collection
correctly identifies garbage cycles on a snapshot.


All garbage cycles are collected.


It is not disturbed by mutator activity.
A garbage cycle created, must exist in next snapshot.
Only garbage cycles are collected.

A non reachable cycle in the snapshot is indeed a
garbage cycle.
Concurrent
program
GC
15
Levanoni-Petrank’s RC

Consider a pointer p that takes the following values
between GC’s: O0,O1, O2, …, On .
p
O0 O1 O2 O3 O4 . . . . . On


All RC algorithms perform 2n operations:
O0.rc--; O1.rc++; O1.rc--; O2.rc++; O2.rc--; … ;
On.rc++;
But only 2 operations are needed:O0.rc--,On.rc++
16
Less Cycle Candidates


Previous algorithms: object whose rc is
decremented to a non-zero value is considered
as a candidate.
The Levanoni-Petrank’s write-barrier does not
p
log most of the decrements.


Does it “miss” cycles?
O0 O1 O2 O3 O4 . . . . . On
The new cycle collection algorithm collects all
cycles, although performing less work.
17
More in the Paper

Reducing pauses further by stopping each
thread separately (instead of all together).

Care with new races…
Concurrent
program
GC
On-the-Fly

More techniques to reduce the number of
traced objects.
18
Measurements

Implemented in Jikes with two collectors:


The sliding-views reference-counting collector.
The age-oriented collector:


Uses mark and sweep for the young generation and
reference counting for the old generation.
Measurements:


Throughput comparison between cycle collection
and a backup tracing collector.
Characteristic comparison to the previous on-thefly cycle collector.
19
Work Reduction
Work ratio compared to Bacon & Rajan
candidates handled
objects traced
1.4
1
0.8
0.6
0.4
0.2
jb
b
ck
ja
trt
m
ja
va
c
db
ss
je
m
pr
es
s
0
co
Work ratio
1.2
20
Work Reduction with the
Age-Oriented Collector
Work ratio between RC and age oriented
candidates handled
objects traced
1.4
1
0.8
0.6
0.4
0.2
jb
b
ck
ja
trt
m
ja
va
c
db
ss
je
m
pr
es
s
0
co
Work ratio
1.2
21
Throughput Comparison of CycleCollection with Backup Tracing
Throughput ratio: cycle
collection/backup tracing
SPECjbb2000 with 4-8 warehouses - Reference
Counting
1.15
1.1
4 warehouses
1.05
5 warehouses
1
6 warehouses
0.95
7 warehouses
8 warehouses
0.9
0.85
256 320 384 448 512 576 640 704
Heap size
22
Throughput Comparison of CycleCollection with Backup Tracing
SPECjbb2000 with 4-8 warehouses - Age Oriented
Throughput ratio: cycle
collection/backup tracing
1.15
1.1
4 warehouses
1.05
5 warehouses
1
6 warehouses
7 warehouses
0.95
8 warehouses
0.9
0.85
256 320 384 448 512 576 640 704
Heap size
23
Related Work

Cycle collection:




Cyclic reference counting with local mark-scan. Martinez,
Wachenchauzer and Lins [1990].
Cyclic reference counting with lazy mark-scan. Lins [1992].
Concurrent cycle collection in reference counted systems.
Bacon and Rajan [2001].
Other:


An on-the-fly reference counting garbage collector for Java.
Levanoni and Petrank [2001].
Age-Oriented Concurrent Garbage Collection. Paz, Petrank,
and Blackburn [2005].
24
Conclusions


Cycle collection may be efficiently executed
on-the-fly by using Levanoni-Petrank’s RC
with the efficient standard cycle collector.
Today’s benchmarks:



Reference counting for full heap: prefer a backup
tracing.
Reference counting for old generation: slight
preference to cycle collection.
Eyes for the future: with large heaps cycle
collection may outperform backup tracing.
25