slides

Thin Locks: Featherweight
Synchronization for Java
David F. Bacon
Ravi Konuru
Chet Murthy
Mauricio Serrano
IBM T.J. Watson Research Center
David F. Bacon
10/11/98
IBM TJ Watson Research Center
Introduction
u
It’s the same old sad story:
J Java has threads and synchronized methods
L But synchronization is dog-slow
N So synchronization is optional
u
Shoot the foot of your choice:
6 Get bad performance, or
M Get bug-prone code
David F. Bacon
10/11/98
IBM TJ Watson Research Center
The Library Dilemma
u
Libraries must be thread-safe
– All non-trivial methods are synchronized
– Library call to set a bit in a bit vector:
» ~50 instructions to lock and unlock the object
» ~10 instructions method call overhead
» ~5 instructions to actually set the bit
u
Locking overhead frequently 25-50%
– even in single-threaded applications!
David F. Bacon
10/11/98
IBM TJ Watson Research Center
Java Synchronization Features
u
Thread can lock an object repeatedly
– lock nesting count must be kept
u
On exception, thread must release locks
– call stack implicitly names locked objects
– therefore, list of locked objects not needed
– exceptions:
» JNI
» non-nested monitorenter/monitorexit
David F. Bacon
10/11/98
IBM TJ Watson Research Center
Locking Scenarios by Frequency
Ÿ
Object is unlocked.
Ÿ We already locked the object a few times
Ÿ We already locked the object a lot of times
Ÿ
Object is locked and we are the first to queue up
Ÿ Object is locked and other threads are waiting
David F. Bacon
10/11/98
IBM TJ Watson Research Center
David F. Bacon
Benchmark
10/11/98
wingdis
pizza
mocha
javap
javadoc
jaNet
crema
HashJava
Espresso
NetRexx
javacup
jax
javalex
toba
jobe
jolt
jacorb
jgl
javac
trans
Lock Operations (%)
100%
90%
80%
70%
60%
Fourth
50%
Third
Second
40%
First
30%
20%
10%
0%
IBM TJ Watson Research Center
Unified Locking Design
u
Key innovations for most common cases:
– 24-bit lock in object
– Hash code compressed to 2 bits
u
“Thin locks” implemented as a veneer
– pre-existing heavy-weight locks for contention
– simple, portable solution
David F. Bacon
10/11/98
IBM TJ Watson Research Center
Hash Code Compression
Handle-based systems use address of object
u IBM’s JVM has no handles; objects move
u Apply same idea optimistically:
u
– use address of object as its hash code
– save the address when object is moved by GC
– 3 states: unhashed, hashed, hashed & moved
David F. Bacon
10/11/98
IBM TJ Watson Research Center
Thin Locks
24-bit field in object (using hash code space)
u Contains either
u
– Thread index of owner, and lock count, or
– Monitor index of heavy-weight monitor
Compare-and-swap used for lock acquisition
u No atomic operations required for
u
– unlocking
– nested locking
David F. Bacon
10/11/98
IBM TJ Watson Research Center
Locking
S
Thread Index
Count
atomic_t unlocked = obj->monword & ~LOCK_MASK
atomic_t locked = unlocked | thread;
if (CMP&SWP(obj->monword, unlocked, locked))
return;
unsigned xored = obj->monword ^ thread;
if (xored < COUNT_MASK) {
obj->monword += COUNT_INCREMENT;
return;
}
David F. Bacon
10/11/98
IBM TJ Watson Research Center
33222222222211111111110000000000
10987654321098765432109876543210
object size (doublewords)
JHandle*
GP
class pointer or array size
ThreadThread
Pointer
Index
S
Count
Monitor Index
H Type A
DATA
hash code
P - Pinned
A - Array
David F. Bacon
G - Garbage collection (2 bits)
H - Hash code state (2 bits)
S - Monitor Shape Bit
10/11/98
IBM TJ Watson Research Center
David F. Bacon
Benchmark
10/11/98
Threads 80 10^4
Threads 40 10^4
Threads 20 10^4
Threads 10 10^4
NestedCallSync 10^6
CallSync 10^6
Call 10^6
MultiSync 80 10^4
MultiSync 40 10^4
MultiSync 20 10^4
MultiSync 10 10^4
NestedSync 10^6
8000
Sync 10^6
NoSync 10^6
Time (ms)
9000
JDK111
7000
IBM112
ThinLock
6000
5000
4000
3000
2000
1000
0
IBM TJ Watson Research Center
David F. Bacon
Benchmark
10/11/98
wingdis
pizza
1.6
mocha
javap
javadoc
jaNet
crema
HashJava
Espresso
NetRexx
javacup
jax
javalex
toba
jobe
jolt
jacorb
jgl
javac
trans
Speedup
1.8
JDK111
IBM112
ThinLock
1.4
1.2
1
0.8
0.6
0.4
0.2
0
IBM TJ Watson Research Center
500
450
NOP
Inline
400
FnCall
ThinLock
350
Time (ms)
300
UnlkC&S
MP Sync
IBM112
250
200
150
100
50
0
Sync 10^5
CallSync 10^5
MixedSync 10^5
Threads 20 5000
Micro-Benchmark
David F. Bacon
10/11/98
IBM TJ Watson Research Center
Status
u
Implemented in IBM JDK 1.1.2
– ported forward to 1.1.4 and 1.1.6
u
Platforms:
–
–
–
–
–
David F. Bacon
AIX POWER, PowerPC, PowerPC MP
MVS 370/390 MP
Windows 95
WindowsNT x86
IBM Network Computer (NC)
10/11/98
IBM TJ Watson Research Center
Conclusions
u
Java monitors can be efficient
– 5-10 instructions to lock/unlock an object
– no increase in object size
Median speedups of 1.22, maximum 1.7
u Minimal changes to run-time system
u
– portable, maintainable solution
u
Scalability over a much wider range
– larger “working sets” of locked objects
– multiprocessor systems
David F. Bacon
10/11/98
IBM TJ Watson Research Center