Thin Locks: Featherweight Synchronization for Java David F. Bacon Ravi Konuru Chet Murthy Mauricio Serrano IBM T.J. Watson Research Center David F. Bacon 10/11/98 IBM TJ Watson Research Center Introduction u It’s the same old sad story: J Java has threads and synchronized methods L But synchronization is dog-slow N So synchronization is optional u Shoot the foot of your choice: 6 Get bad performance, or M Get bug-prone code David F. Bacon 10/11/98 IBM TJ Watson Research Center The Library Dilemma u Libraries must be thread-safe – All non-trivial methods are synchronized – Library call to set a bit in a bit vector: » ~50 instructions to lock and unlock the object » ~10 instructions method call overhead » ~5 instructions to actually set the bit u Locking overhead frequently 25-50% – even in single-threaded applications! David F. Bacon 10/11/98 IBM TJ Watson Research Center Java Synchronization Features u Thread can lock an object repeatedly – lock nesting count must be kept u On exception, thread must release locks – call stack implicitly names locked objects – therefore, list of locked objects not needed – exceptions: » JNI » non-nested monitorenter/monitorexit David F. Bacon 10/11/98 IBM TJ Watson Research Center Locking Scenarios by Frequency Ÿ Object is unlocked. Ÿ We already locked the object a few times Ÿ We already locked the object a lot of times Ÿ Object is locked and we are the first to queue up Ÿ Object is locked and other threads are waiting David F. Bacon 10/11/98 IBM TJ Watson Research Center David F. Bacon Benchmark 10/11/98 wingdis pizza mocha javap javadoc jaNet crema HashJava Espresso NetRexx javacup jax javalex toba jobe jolt jacorb jgl javac trans Lock Operations (%) 100% 90% 80% 70% 60% Fourth 50% Third Second 40% First 30% 20% 10% 0% IBM TJ Watson Research Center Unified Locking Design u Key innovations for most common cases: – 24-bit lock in object – Hash code compressed to 2 bits u “Thin locks” implemented as a veneer – pre-existing heavy-weight locks for contention – simple, portable solution David F. Bacon 10/11/98 IBM TJ Watson Research Center Hash Code Compression Handle-based systems use address of object u IBM’s JVM has no handles; objects move u Apply same idea optimistically: u – use address of object as its hash code – save the address when object is moved by GC – 3 states: unhashed, hashed, hashed & moved David F. Bacon 10/11/98 IBM TJ Watson Research Center Thin Locks 24-bit field in object (using hash code space) u Contains either u – Thread index of owner, and lock count, or – Monitor index of heavy-weight monitor Compare-and-swap used for lock acquisition u No atomic operations required for u – unlocking – nested locking David F. Bacon 10/11/98 IBM TJ Watson Research Center Locking S Thread Index Count atomic_t unlocked = obj->monword & ~LOCK_MASK atomic_t locked = unlocked | thread; if (CMP&SWP(obj->monword, unlocked, locked)) return; unsigned xored = obj->monword ^ thread; if (xored < COUNT_MASK) { obj->monword += COUNT_INCREMENT; return; } David F. Bacon 10/11/98 IBM TJ Watson Research Center 33222222222211111111110000000000 10987654321098765432109876543210 object size (doublewords) JHandle* GP class pointer or array size ThreadThread Pointer Index S Count Monitor Index H Type A DATA hash code P - Pinned A - Array David F. Bacon G - Garbage collection (2 bits) H - Hash code state (2 bits) S - Monitor Shape Bit 10/11/98 IBM TJ Watson Research Center David F. Bacon Benchmark 10/11/98 Threads 80 10^4 Threads 40 10^4 Threads 20 10^4 Threads 10 10^4 NestedCallSync 10^6 CallSync 10^6 Call 10^6 MultiSync 80 10^4 MultiSync 40 10^4 MultiSync 20 10^4 MultiSync 10 10^4 NestedSync 10^6 8000 Sync 10^6 NoSync 10^6 Time (ms) 9000 JDK111 7000 IBM112 ThinLock 6000 5000 4000 3000 2000 1000 0 IBM TJ Watson Research Center David F. Bacon Benchmark 10/11/98 wingdis pizza 1.6 mocha javap javadoc jaNet crema HashJava Espresso NetRexx javacup jax javalex toba jobe jolt jacorb jgl javac trans Speedup 1.8 JDK111 IBM112 ThinLock 1.4 1.2 1 0.8 0.6 0.4 0.2 0 IBM TJ Watson Research Center 500 450 NOP Inline 400 FnCall ThinLock 350 Time (ms) 300 UnlkC&S MP Sync IBM112 250 200 150 100 50 0 Sync 10^5 CallSync 10^5 MixedSync 10^5 Threads 20 5000 Micro-Benchmark David F. Bacon 10/11/98 IBM TJ Watson Research Center Status u Implemented in IBM JDK 1.1.2 – ported forward to 1.1.4 and 1.1.6 u Platforms: – – – – – David F. Bacon AIX POWER, PowerPC, PowerPC MP MVS 370/390 MP Windows 95 WindowsNT x86 IBM Network Computer (NC) 10/11/98 IBM TJ Watson Research Center Conclusions u Java monitors can be efficient – 5-10 instructions to lock/unlock an object – no increase in object size Median speedups of 1.22, maximum 1.7 u Minimal changes to run-time system u – portable, maintainable solution u Scalability over a much wider range – larger “working sets” of locked objects – multiprocessor systems David F. Bacon 10/11/98 IBM TJ Watson Research Center