VEE2010_AllocationSite_Slides.pdf

Efficient Runtime Tracking of
Allocation Sites in Java
Rei Odaira, Kazunori Ogata,
Kiyokuni Kawachiya, Tamiya Onodera,
Toshio Nakatani
IBM Research - Tokyo
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Why Do You Need Allocation Site Information?
How to fix a memory leak after two weeks of execution?
> java com.example.Server
started
......running for one week
......running for two weeks
server crashed due to java.lang.OutOfMemoryError!
java.lang.OutOfMemoryError!
>
2
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Allocation Site Tracker Helps!
Tracker tells you where each object was allocated.
Bytes
Class
900,278,800
98,148,020
20,352,384
……
String
LinkedList
String
……
Allocation site
(Method name and bytecode index)
com.example.Property.putProperty()#191
com.example.DataTable.putInteger()#187
com.example.Property.prepare()#35
……
void putProperty(Element elem) {
……
String attribute = new String(elem.getString());
……
}
Allocation site is a good starting point for fixing the leak.
Also, optimizations in JVM can benefit from the tracker.
3
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Tracker Should Be Always Enabled.
For fixing memory leaks …
> java com.example.Server
started
......running for one week
......running for two weeks
server crashed due to java.lang.OutOfMemoryError!
> java –enableTracker com.example.Server
started
......running for one week
......running for two weeks
should have always enabled the tracker.
And also for JVM optimizations ….
4
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Java object layout
Challenge: Performance Overhead
Header
Adding allocation site information to each object
hits performance [Hauswirth et al., 2004].
– Reducing effective CPU cache size,
increasing GC frequency and overhead, etc.
Instance fields
Allocation site info
n
.m
ea
Ge
o
pm
d
jbb
20
05
rc
h
x
ea
nd
e
b
lui
ql d
hs
lus
SP
EC
mp
il
co
mp
il
co
5
fo
p
Not good for production environments
om
pil
er
.su er
nf
low
co
mp
re
ss
de
rb
mp
y
eg
au
dio
se
ria
l
su
n
xm
f lo
l .tr
w
an
sfo
xm
rm
l .v
ali
da
t io
n
an
tlr
ch
ar
t
ec
lip
se
16
14
12
10
8
6
4
2
0
-2
er
.c
Performance overhead (%)
IBM Java 6 SR3 64-bit compressed pointer / Linux 2.6.18 / 8x x86-64 Intel Xeon 1.86GHz
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Minimal-Overhead Allocation Site Trackers
Never increase per-object space.
Allocation-Site-as-a-Hash-code (ASH)
Tracker
– Performance overhead:
~0% on average, 1.4% at maximum.
– Some JVMs do not always
have a hash code field.
Allocation-Site-via-a-Class-pointer (ASC)
Tracker
– Performance overhead:
1% on average, 2% at maximum.
Java object layout
Class pointer
Hash code Flags
Instance fields
– Almost all JVMs have a class pointer field.
6
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Outline
Introduction
ASH Tracker
ASC Tracker
Experiments
Applications of ASH/ASC Trackers
Conclusions
7
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Allocation-Site-as-a-Hash-code (ASH) Tracker
Embed an allocation-site ID into a hash code field.
– Embed at allocation time.
ID is a unique integer of the site.
– Assigned by an interpreter or a JIT compiler.
Original layout
w/ ASH Tracker
Class pointer
Class pointer
Hash code Flags
Instance fields
Object-specific random value,
used as a hash table index, for example.
8
VEE 2010 | Mar 19, 2010
Flags
Instance fields
Allocation site ID
© 2010 IBM Corporation
IBM Research - Tokyo
How to Deal with Hash Code Collisions?
Hash code values should be as distinct as possible from all others
[Java API Spec].
– Collisions slow down hash-table access, for example.
How about appending a site ID field when hash code is first
referred to?
Some programs often refer to hash code.
Allocation time
Site ID
First time hash code
is referred to.
Hash code
Site ID
9
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Collision Avoidance without Increasing Space
Our solution:
– Embed a shorter dynamic ID and a random value.
Hash code value = dynamic ID + random value
– Random value helps avoid collisions.
First time hash code
is referred to.
Allocation time
Other objects are
not affected.
Object X
Object X
Object Y
1111010111
000110 0101
1111010111
Static ID
Dynamic ID
Object-specific random value
Hash code value
10
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
How to Deal with Hash Code Collisions? (2nd Round)
All objects allocated at the same site have the same
high-order bits in their hash code values.
Need a long random-value field to avoid collisions.
Hash code field: Dynamic ID
Random
Need a long ID field to track allocation sites accurately.
11
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Variable-Length Dynamic ID
Shorter IDs for hot allocation sites.
– Allocate many objects.
Need a long random value.
– Not so many hot allocation sites in
a program.
Short site IDs suffice.
Longer IDs for cold allocation sites.
– Allocate few objects.
Short random values suffice.
– Many cold allocations sites in a
program.
Need long site IDs.
for (i = 0;
i < BIG_NUMBER;
i++) {
obj = new Object();
use(obj.hashCode());
}
Dyn. ID
Random
if (error1) {
obj = new Object();
use(obj.hashCode());
}
...
if (error2) {
obj = new Object();
use(obj.hashCode());
}
Dynamic ID
12
VEE 2010 | Mar 19, 2010
Random
© 2010 IBM Corporation
IBM Research - Tokyo
Dynamic Shrinking of ID Field
It is not known in advance …
– How many objects will be allocated at each site; or
– How many of their hash code values will be referred to.
Our solution: Make the IDs of a site shorter and shorter …
– As more and more hash code values are referred to.
Hash code of objects allocated
at putProperty()#191
Object 1 0 0 0 1 1 0 0 1 0 1
Object 2 0 0 0 1 1 0 1 0 0 1
2) Occasionally, a random
value becomes all zero.
Object 3 0 0 0 1 1 0 1 0 1 1
Object 4 0 0 0 1 1 0 0 0 0 0
3) Assign a new shorter ID.
Maintain a mapping table.
Object 5 0 0 1 0 1 0 1 1 1 0
Object 6 0 0 1 0 1 0 1 0 0 0
13
1) Assign a long dynamic ID at first.
VEE 2010 | Mar 19, 2010
Dynamic ID
Allocation site
000110 putProperty()#191
00101 putProperty()#191
……
© 2010 IBM Corporation
IBM Research - Tokyo
Outline
Introduction
ASH Tracker
ASC Tracker
Experiments
Applications of ASH/ASC Trackers
Conclusions
14
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
What If There Is No Hash Code Field?
Some JVMs do not always have a hash code field.
Almost all JVMs have a class pointer field.
– To access class meta data.
Class pointer
Flags
Instance fields
Objects allocated at
putProperty()#191
Class pointer
15
Class structure
Virtual
function table
Instance size
Other class
meta data
Class pointer
Flags
Instance fields
Objects allocated at
prepare()#35
Class pointer
Flags
Flags
Instance fields
Instance fields
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Allocation-Site-via-a-Class-pointer (ASC) Tracker
Replace the class pointer with a pointer to its allocation
site structure.
– This is possible because each allocation site always
allocates objects of the same class.
Alloc site ptr
Alloc site ptr
Class pointer
Class pointer
putProperty
()#191
prepare()
#35
Objects allocated at
putProperty()#191
Alloc site ptr
Virtual function
table
Instance size
Objects allocated at
prepare()#35
Alloc site ptr
Other class
meta data
16
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Mitigating Indirection Overhead (1)
Duplicate frequently-accessed constant class fields.
Need to choose carefully which fields to duplicate.
– Not to increase cache misses.
– Not to increase space overhead.
Alloc site ptr
Class pointer
Instance size
Class pointer
Instance size
putProperty
()#191
prepare()
#35
Alloc site ptr
17
VEE 2010 | Mar 19, 2010
Virtual
function table
Instance size
Other class
meta data
Alloc site ptr
Alloc site ptr
© 2010 IBM Corporation
IBM Research - Tokyo
Mitigating Indirection Overhead (2)
Profiling-based
devirtualization by
a JIT compiler.
if (object->class_ptr != HOT_CLASS)
goto SlowPath;
/* Inlined method, etc. */
Another load needed
by ASC Tracker.
if (object->alloc_site->class_ptr
!= HOT_CLASS)
alloc_site
goto SlowPath;
/* Inlined method, etc. */
Emit an allocation-site equality
check where possible.
if (object->alloc_site != HOT_ALLOCATION_SITE)
HOT_ALLOCATION_SITE
goto SlowPath;
/* Inlined method, etc. */
18
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Outline
Introduction
ASH Tracker
ASC Tracker
Experiments
Applications of ASH/ASC Trackers
Conclusions
19
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Evaluation
Environment
– 64-bit IBM J9/TR JVM 1.6.0 SR3
• 3-word header (1 word = 32 bits)
• Generational GC (copying young + mark-and-sweep old GC)
– 4x minimum Java heap
– 8-core 1.8GHz Intel Xeon
– Linux 2.6.18
Benchmarks
– SPECjvm2008, DaCapo, and SPECjbb2005
ASH Tracker
– Uses a 15-bit hash code field.
– Shrinks a dynamic ID field from 11 bits to 2 bits.
ASC Tracker
– Duplicates an instance-size field and an array-element-class field.
20
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Performance Overhead
16
14
12
10
8
6
4
2
0
-2
Per-object 4-byte field
ASH Tracker
n
.m
ea
Ge
o
pm
d
jbb
20
05
SP
EC
rc
h
ea
x
nd
e
b
lui
qld
hs
fo
p
e
li p
s
ec
ar
t
ch
tlr
an
ion
da
t
lus
xm
l .v
ali
fo
rm
w
an
s
nf
lo
ria
l
se
ud
io
ga
rb
y
de
su
xm
l .tr
mp
il
co
mp
il
co
21
mp
e
om
pil
er
er
.su
nf
low
co
mp
re
ss
ASC Tracker
er
.c
Performance overhead (%)
ASH Tracker: on average ~0% and at most 1.4% overhead.
– Up to 1.73x hash code collisions compared with the baseline.
ASC Tracker: on average 1.0% and at most 2.0% overhead.
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Performance Overhead
16
14
12
10
8
6
4
2
0
-2
Per-object 4-byte field
ASH Tracker
ASC Tracker
n
.m
ea
Ge
o
pm
d
jbb
20
05
SP
EC
rc
h
ea
x
nd
e
b
lui
qld
hs
fo
p
e
li p
s
ec
ar
t
ch
tlr
an
ion
da
t
lus
xm
l .v
ali
fo
rm
w
an
s
nf
lo
ria
l
se
ud
io
ga
rb
y
de
su
xm
l .tr
mp
il
co
mp
il
co
22
mp
e
om
pil
er
er
.su
nf
low
co
mp
re
ss
Serial and pmd refer to the hash code of:
more than 20% of allocated objects, and
5-11% of average live objects.
er
.c
Performance overhead (%)
ASH Tracker: on average ~0% and at most 1.4% overhead.
– Up to 1.73x hash code collisions compared with the baseline.
ASC Tracker: on average 1.0% and at most 2.0% overhead.
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Outline
Introduction
ASH Tracker
ASC Tracker
Experiments
Applications of ASH/ASC Trackers
Conclusions
23
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Minimal-Overhead Memory Leak Detector
Scan the entire Java heap at each global GC time.
– Count the numbers of live objects for each allocation site.
– Save the numbers in per-allocation-site histories.
(Inform users of possible memory leaks.)
Java heap
obj
obj
obj
obj
obj
obj
obj
Per-allocation-site counters 6
24
#live objects history
6, 5, 4, ……
3, 3, 3, ……
2, 1, 1, ……
……
VEE 2010 | Mar 19, 2010
obj
3
obj
obj
obj
2
Allocation site
Property.putProperty()#191
DataTable.putInteger()#187
Property.prepare()#35
……
© 2010 IBM Corporation
IBM Research - Tokyo
Performance Overhead of the Leak Detector
n
.m
ea
Ge
o
pm
d
jbb
20
05
rc
h
x
ea
nd
e
b
lui
qld
hs
lus
SP
EC
mp
il
co
mp
il
co
25
ASC Tracker
om
pil
er
er
.su
nf
low
co
mp
re
ss
de
rb
mp
y
eg
au
dio
se
ria
l
su
n
xm
f lo
w
l .tr
an
sfo
xm
rm
l .v
ali
da
t io
n
an
tlr
ch
ar
t
ec
li p
se
ASH Tracker
fo
p
3
2.5
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
er
.c
Performance overhead (%)
ASH Tracker: on average 0.3% and at most 1.7% overhead
ASC Tracker: on average 1.1% and at most 2.4% overhead
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Allocation-Site-Based Object Pretenuring
for Generational GC
Refer to our paper for more details.
4x min Java heap: at maximum 11% speed-up
2x min Java heap: at maximum 15% speed-up
n
.m
ea
Ge
o
pm
d
jbb
20
05
rc
h
ea
x
nd
e
b
lui
ql d
hs
fo
p
e
lip
s
tlr
ar
t
ch
ec
lus
SP
EC
mp
il
4x min Java heap
co
mp
il
co
26
an
om
pil
er
er
.su
nf
low
co
mp
re
ss
de
rb
mp
y
eg
au
dio
se
ria
l
su
n
xm
f lo
w
l.tr
an
sfo
xm
rm
l.v
ali
da
t io
n
2x min Java heap
er
.c
Speed-up (%)
22
20
18
16
14
12
10
8
6
4
2
0
-2
-4
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Related Work
Bit-Encoding Leak Location (Bell)
[Bond, et al., ASPLOS 2006]
– Probabilistic allocation site tracker
– Requires a sufficient number of samples.
– Cannot identify the allocation site of each object.
Not suitable for JVM optimizations.
Techniques for object header compression
[Bacon, et al., ECOOP 2002]
– Remove the hash code field.
Use ASC Tracker.
– Steal several bits of the class pointer.
Steal those bits of the allocation site pointer.
27
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Conclusion
Minimal-overhead allocation site trackers
ASH Tracker
– Embeds an allocation site ID into the hash code field.
– Performance overhead:
~0% on average, 1.4% at maximum.
ASC Tracker
– Makes the class pointer field point to an allocation site structure.
– Performance overhead:
1% on average, 2% at maximum.
Useful for both reliability and optimization.
– Reliability: minimal-overhead memory leak detector.
– Optimization: allocation-site-based object pretenuring.
28
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Thank you!
Questions?
29
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Back-up
30
VEE 2010 | Mar 19, 2010
© 2010 IBM Corporation
IBM Research - Tokyo
Space Overhead Compared with Physical Memory Usage
10
9
8
7
6
5
4
3
2
1
0
Per-object 4-byte field
ASH Tracker
VEE 2010 | Mar 19, 2010
n
.m
ea
Ge
o
pm
d
jbb
Ec
20
lip
05
se
/m
an
ua
l
rc
h
ea
x
nd
e
b
lui
qld
hs
lus
SP
EC
mp
il
co
mp
il
co
31
fo
p
om
pil
er
.su er
nf
low
co
mp
re
ss
de
rb
mp
y
eg
au
dio
se
ria
l
su
n
xm
f lo
l .tr
w
an
sfo
xm
rm
l .v
ali
da
t io
n
an
tlr
ch
ar
t
ec
li p
se
ASC Tracker
er
.c
Space overhead (%)
ASH Tracker: on average 0.4% overhead
ASC Tracker: on average 0.2% overhead
© 2010 IBM Corporation
IBM Research - Tokyo
Using ASH Tracker for Object Pretenuring
Copy likely-to-be-long-lived objects directly to a “tenured” space.
– Not copying such objects multiple times within a young space.
1. Online profiling
Compute the ratio of tenured
objects (#t/#s) for each
allocation site at young GC time.
Young space
Survivor space
Allocation space
Survivor space
Object
#s
Tenured space
32
Enable pretenuring for an
allocation site if its ratio
exceeds a certain threshold.
Young space
Allocation space
Object
2. Pretenuring
#t
VEE 2010 | Mar 19, 2010
Tenured space
© 2010 IBM Corporation