Efficient Runtime Tracking of Allocation Sites in Java Rei Odaira, Kazunori Ogata, Kiyokuni Kawachiya, Tamiya Onodera, Toshio Nakatani IBM Research - Tokyo VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Why Do You Need Allocation Site Information? How to fix a memory leak after two weeks of execution? > java com.example.Server started ......running for one week ......running for two weeks server crashed due to java.lang.OutOfMemoryError! java.lang.OutOfMemoryError! > 2 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Allocation Site Tracker Helps! Tracker tells you where each object was allocated. Bytes Class 900,278,800 98,148,020 20,352,384 …… String LinkedList String …… Allocation site (Method name and bytecode index) com.example.Property.putProperty()#191 com.example.DataTable.putInteger()#187 com.example.Property.prepare()#35 …… void putProperty(Element elem) { …… String attribute = new String(elem.getString()); …… } Allocation site is a good starting point for fixing the leak. Also, optimizations in JVM can benefit from the tracker. 3 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Tracker Should Be Always Enabled. For fixing memory leaks … > java com.example.Server started ......running for one week ......running for two weeks server crashed due to java.lang.OutOfMemoryError! > java –enableTracker com.example.Server started ......running for one week ......running for two weeks should have always enabled the tracker. And also for JVM optimizations …. 4 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Java object layout Challenge: Performance Overhead Header Adding allocation site information to each object hits performance [Hauswirth et al., 2004]. – Reducing effective CPU cache size, increasing GC frequency and overhead, etc. Instance fields Allocation site info n .m ea Ge o pm d jbb 20 05 rc h x ea nd e b lui ql d hs lus SP EC mp il co mp il co 5 fo p Not good for production environments om pil er .su er nf low co mp re ss de rb mp y eg au dio se ria l su n xm f lo l .tr w an sfo xm rm l .v ali da t io n an tlr ch ar t ec lip se 16 14 12 10 8 6 4 2 0 -2 er .c Performance overhead (%) IBM Java 6 SR3 64-bit compressed pointer / Linux 2.6.18 / 8x x86-64 Intel Xeon 1.86GHz VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Minimal-Overhead Allocation Site Trackers Never increase per-object space. Allocation-Site-as-a-Hash-code (ASH) Tracker – Performance overhead: ~0% on average, 1.4% at maximum. – Some JVMs do not always have a hash code field. Allocation-Site-via-a-Class-pointer (ASC) Tracker – Performance overhead: 1% on average, 2% at maximum. Java object layout Class pointer Hash code Flags Instance fields – Almost all JVMs have a class pointer field. 6 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Outline Introduction ASH Tracker ASC Tracker Experiments Applications of ASH/ASC Trackers Conclusions 7 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Allocation-Site-as-a-Hash-code (ASH) Tracker Embed an allocation-site ID into a hash code field. – Embed at allocation time. ID is a unique integer of the site. – Assigned by an interpreter or a JIT compiler. Original layout w/ ASH Tracker Class pointer Class pointer Hash code Flags Instance fields Object-specific random value, used as a hash table index, for example. 8 VEE 2010 | Mar 19, 2010 Flags Instance fields Allocation site ID © 2010 IBM Corporation IBM Research - Tokyo How to Deal with Hash Code Collisions? Hash code values should be as distinct as possible from all others [Java API Spec]. – Collisions slow down hash-table access, for example. How about appending a site ID field when hash code is first referred to? Some programs often refer to hash code. Allocation time Site ID First time hash code is referred to. Hash code Site ID 9 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Collision Avoidance without Increasing Space Our solution: – Embed a shorter dynamic ID and a random value. Hash code value = dynamic ID + random value – Random value helps avoid collisions. First time hash code is referred to. Allocation time Other objects are not affected. Object X Object X Object Y 1111010111 000110 0101 1111010111 Static ID Dynamic ID Object-specific random value Hash code value 10 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo How to Deal with Hash Code Collisions? (2nd Round) All objects allocated at the same site have the same high-order bits in their hash code values. Need a long random-value field to avoid collisions. Hash code field: Dynamic ID Random Need a long ID field to track allocation sites accurately. 11 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Variable-Length Dynamic ID Shorter IDs for hot allocation sites. – Allocate many objects. Need a long random value. – Not so many hot allocation sites in a program. Short site IDs suffice. Longer IDs for cold allocation sites. – Allocate few objects. Short random values suffice. – Many cold allocations sites in a program. Need long site IDs. for (i = 0; i < BIG_NUMBER; i++) { obj = new Object(); use(obj.hashCode()); } Dyn. ID Random if (error1) { obj = new Object(); use(obj.hashCode()); } ... if (error2) { obj = new Object(); use(obj.hashCode()); } Dynamic ID 12 VEE 2010 | Mar 19, 2010 Random © 2010 IBM Corporation IBM Research - Tokyo Dynamic Shrinking of ID Field It is not known in advance … – How many objects will be allocated at each site; or – How many of their hash code values will be referred to. Our solution: Make the IDs of a site shorter and shorter … – As more and more hash code values are referred to. Hash code of objects allocated at putProperty()#191 Object 1 0 0 0 1 1 0 0 1 0 1 Object 2 0 0 0 1 1 0 1 0 0 1 2) Occasionally, a random value becomes all zero. Object 3 0 0 0 1 1 0 1 0 1 1 Object 4 0 0 0 1 1 0 0 0 0 0 3) Assign a new shorter ID. Maintain a mapping table. Object 5 0 0 1 0 1 0 1 1 1 0 Object 6 0 0 1 0 1 0 1 0 0 0 13 1) Assign a long dynamic ID at first. VEE 2010 | Mar 19, 2010 Dynamic ID Allocation site 000110 putProperty()#191 00101 putProperty()#191 …… © 2010 IBM Corporation IBM Research - Tokyo Outline Introduction ASH Tracker ASC Tracker Experiments Applications of ASH/ASC Trackers Conclusions 14 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo What If There Is No Hash Code Field? Some JVMs do not always have a hash code field. Almost all JVMs have a class pointer field. – To access class meta data. Class pointer Flags Instance fields Objects allocated at putProperty()#191 Class pointer 15 Class structure Virtual function table Instance size Other class meta data Class pointer Flags Instance fields Objects allocated at prepare()#35 Class pointer Flags Flags Instance fields Instance fields VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Allocation-Site-via-a-Class-pointer (ASC) Tracker Replace the class pointer with a pointer to its allocation site structure. – This is possible because each allocation site always allocates objects of the same class. Alloc site ptr Alloc site ptr Class pointer Class pointer putProperty ()#191 prepare() #35 Objects allocated at putProperty()#191 Alloc site ptr Virtual function table Instance size Objects allocated at prepare()#35 Alloc site ptr Other class meta data 16 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Mitigating Indirection Overhead (1) Duplicate frequently-accessed constant class fields. Need to choose carefully which fields to duplicate. – Not to increase cache misses. – Not to increase space overhead. Alloc site ptr Class pointer Instance size Class pointer Instance size putProperty ()#191 prepare() #35 Alloc site ptr 17 VEE 2010 | Mar 19, 2010 Virtual function table Instance size Other class meta data Alloc site ptr Alloc site ptr © 2010 IBM Corporation IBM Research - Tokyo Mitigating Indirection Overhead (2) Profiling-based devirtualization by a JIT compiler. if (object->class_ptr != HOT_CLASS) goto SlowPath; /* Inlined method, etc. */ Another load needed by ASC Tracker. if (object->alloc_site->class_ptr != HOT_CLASS) alloc_site goto SlowPath; /* Inlined method, etc. */ Emit an allocation-site equality check where possible. if (object->alloc_site != HOT_ALLOCATION_SITE) HOT_ALLOCATION_SITE goto SlowPath; /* Inlined method, etc. */ 18 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Outline Introduction ASH Tracker ASC Tracker Experiments Applications of ASH/ASC Trackers Conclusions 19 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Evaluation Environment – 64-bit IBM J9/TR JVM 1.6.0 SR3 • 3-word header (1 word = 32 bits) • Generational GC (copying young + mark-and-sweep old GC) – 4x minimum Java heap – 8-core 1.8GHz Intel Xeon – Linux 2.6.18 Benchmarks – SPECjvm2008, DaCapo, and SPECjbb2005 ASH Tracker – Uses a 15-bit hash code field. – Shrinks a dynamic ID field from 11 bits to 2 bits. ASC Tracker – Duplicates an instance-size field and an array-element-class field. 20 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Performance Overhead 16 14 12 10 8 6 4 2 0 -2 Per-object 4-byte field ASH Tracker n .m ea Ge o pm d jbb 20 05 SP EC rc h ea x nd e b lui qld hs fo p e li p s ec ar t ch tlr an ion da t lus xm l .v ali fo rm w an s nf lo ria l se ud io ga rb y de su xm l .tr mp il co mp il co 21 mp e om pil er er .su nf low co mp re ss ASC Tracker er .c Performance overhead (%) ASH Tracker: on average ~0% and at most 1.4% overhead. – Up to 1.73x hash code collisions compared with the baseline. ASC Tracker: on average 1.0% and at most 2.0% overhead. VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Performance Overhead 16 14 12 10 8 6 4 2 0 -2 Per-object 4-byte field ASH Tracker ASC Tracker n .m ea Ge o pm d jbb 20 05 SP EC rc h ea x nd e b lui qld hs fo p e li p s ec ar t ch tlr an ion da t lus xm l .v ali fo rm w an s nf lo ria l se ud io ga rb y de su xm l .tr mp il co mp il co 22 mp e om pil er er .su nf low co mp re ss Serial and pmd refer to the hash code of: more than 20% of allocated objects, and 5-11% of average live objects. er .c Performance overhead (%) ASH Tracker: on average ~0% and at most 1.4% overhead. – Up to 1.73x hash code collisions compared with the baseline. ASC Tracker: on average 1.0% and at most 2.0% overhead. VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Outline Introduction ASH Tracker ASC Tracker Experiments Applications of ASH/ASC Trackers Conclusions 23 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Minimal-Overhead Memory Leak Detector Scan the entire Java heap at each global GC time. – Count the numbers of live objects for each allocation site. – Save the numbers in per-allocation-site histories. (Inform users of possible memory leaks.) Java heap obj obj obj obj obj obj obj Per-allocation-site counters 6 24 #live objects history 6, 5, 4, …… 3, 3, 3, …… 2, 1, 1, …… …… VEE 2010 | Mar 19, 2010 obj 3 obj obj obj 2 Allocation site Property.putProperty()#191 DataTable.putInteger()#187 Property.prepare()#35 …… © 2010 IBM Corporation IBM Research - Tokyo Performance Overhead of the Leak Detector n .m ea Ge o pm d jbb 20 05 rc h x ea nd e b lui qld hs lus SP EC mp il co mp il co 25 ASC Tracker om pil er er .su nf low co mp re ss de rb mp y eg au dio se ria l su n xm f lo w l .tr an sfo xm rm l .v ali da t io n an tlr ch ar t ec li p se ASH Tracker fo p 3 2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 er .c Performance overhead (%) ASH Tracker: on average 0.3% and at most 1.7% overhead ASC Tracker: on average 1.1% and at most 2.4% overhead VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Allocation-Site-Based Object Pretenuring for Generational GC Refer to our paper for more details. 4x min Java heap: at maximum 11% speed-up 2x min Java heap: at maximum 15% speed-up n .m ea Ge o pm d jbb 20 05 rc h ea x nd e b lui ql d hs fo p e lip s tlr ar t ch ec lus SP EC mp il 4x min Java heap co mp il co 26 an om pil er er .su nf low co mp re ss de rb mp y eg au dio se ria l su n xm f lo w l.tr an sfo xm rm l.v ali da t io n 2x min Java heap er .c Speed-up (%) 22 20 18 16 14 12 10 8 6 4 2 0 -2 -4 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Related Work Bit-Encoding Leak Location (Bell) [Bond, et al., ASPLOS 2006] – Probabilistic allocation site tracker – Requires a sufficient number of samples. – Cannot identify the allocation site of each object. Not suitable for JVM optimizations. Techniques for object header compression [Bacon, et al., ECOOP 2002] – Remove the hash code field. Use ASC Tracker. – Steal several bits of the class pointer. Steal those bits of the allocation site pointer. 27 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Conclusion Minimal-overhead allocation site trackers ASH Tracker – Embeds an allocation site ID into the hash code field. – Performance overhead: ~0% on average, 1.4% at maximum. ASC Tracker – Makes the class pointer field point to an allocation site structure. – Performance overhead: 1% on average, 2% at maximum. Useful for both reliability and optimization. – Reliability: minimal-overhead memory leak detector. – Optimization: allocation-site-based object pretenuring. 28 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Thank you! Questions? 29 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Back-up 30 VEE 2010 | Mar 19, 2010 © 2010 IBM Corporation IBM Research - Tokyo Space Overhead Compared with Physical Memory Usage 10 9 8 7 6 5 4 3 2 1 0 Per-object 4-byte field ASH Tracker VEE 2010 | Mar 19, 2010 n .m ea Ge o pm d jbb Ec 20 lip 05 se /m an ua l rc h ea x nd e b lui qld hs lus SP EC mp il co mp il co 31 fo p om pil er .su er nf low co mp re ss de rb mp y eg au dio se ria l su n xm f lo l .tr w an sfo xm rm l .v ali da t io n an tlr ch ar t ec li p se ASC Tracker er .c Space overhead (%) ASH Tracker: on average 0.4% overhead ASC Tracker: on average 0.2% overhead © 2010 IBM Corporation IBM Research - Tokyo Using ASH Tracker for Object Pretenuring Copy likely-to-be-long-lived objects directly to a “tenured” space. – Not copying such objects multiple times within a young space. 1. Online profiling Compute the ratio of tenured objects (#t/#s) for each allocation site at young GC time. Young space Survivor space Allocation space Survivor space Object #s Tenured space 32 Enable pretenuring for an allocation site if its ratio exceeds a certain threshold. Young space Allocation space Object 2. Pretenuring #t VEE 2010 | Mar 19, 2010 Tenured space © 2010 IBM Corporation