Healing Data Races On-The-Fly Bohuslav Křena, Zdeněk Letko, Tomáš Vojnar ( Brno university of Technology, Czech Republic ) Rachel Tzoref, Shmuel Ur ( IBM, Haifa Research Lab, Israel ) PADTAD, London 07/2007 1 Agenda ● Introduction ● Self-Healing steps: – Problem detection – Problem localisation – Problem healing – Healing assurance ● Preliminary results and experiments ● Discussion PADTAD, London 07/2007 2 What is a race? ● ● ● A data race occurs when two concurrent threads access a shared variable and when: – at least one access is a write and – the threads use no explicit mechanism to prevent the accesses from being simultaneous. Usually a data race is a serious error caused by failure to synchronize properly. Can cause wrong results, deadlocks, exceptions... PADTAD, London 07/2007 3 Atomicity races ● ● Races caused by violation of wrong assumptions that some blocks of code will be executed atomically. Example: Thread 1 void someMethod(){ long local = shared; local = update(local); shared = local; } Thread 2 void someMethod(){ long local = shared; local = update(local); shared = local; } PADTAD, London 07/2007 4 Atomicity races ● ● Races caused by violation of wrong assumptions that some blocks of code will be executed atomically. Example: Thread 1 void someMethod(){ shared=update(shared); } Thread 2 void someMethod(){ shared=update(shared); } PADTAD, London 07/2007 5 Inherent races ● Races not related to atomicity. ● Data race if the following holds: – Executing any segment of code in each thread atomically does not determine an order of accesses to shared variable. – The different orders in which the shared variable is acessed can be classified as “good” and “bad” according to the expected behaviour of the program. PADTAD, London 07/2007 6 Inherent races ● Example: Thread 1 void synchronized someMethod(){ long local = shared; local = update(local); shared = local; } Thread2 void synchronized otherMethod(){ shared = null; } PADTAD, London 07/2007 7 Agenda ● Introduction ● Self-Healing steps: – Problem detection – Problem localisation – Problem healing – Healing assurance ● Preliminary results and experiments ● Discussion PADTAD, London 07/2007 8 Problem detection ● Eraser algorithm – ● Detects so called apparent data races Principle: – For each variable maintains its state and the set of candidate locks – Race is detected whenever: ● ● the variable is in state Shared and the set of candidates locks becomes empty PADTAD, London 07/2007 9 Demonstration of the detection static class Flight { private int soldSeats; ... Flight(){ soldSeats = 0; ... } ... boolean bookTicket(){ soldSeats++; } ... } Thread_main new Flight(); (state = Exclusive, C(v)={}) Thread 1 synchronized(lock){ bookTicket();} (state = Shared, C(v)={lock}) Thread 2 bookTicket(); (state = Race, C(v)={}) PADTAD, London 07/2007 Time 10 Agenda ● Introduction ● Self-Healing steps: – Problem detection – Problem localisation – Problem healing – Healing assurance ● Preliminary results and experiments ● Discussion PADTAD, London 07/2007 11 Problem localisation ● ● ● Often hard task even for humans. Oracle based on looking for pre-specified data race bug patterns in the code with the aid of information collected by race detector. Use formal methods to reduce the number of false alarms but with reasonable overhead. PADTAD, London 07/2007 12 Atomicity Violation Bug Patterns ● Load-store bug pattern x++; ● Test-and-use bug pattern if (p != null) p = p.next; ● Repeated test-and-use pattern while (p != null) p = p.next; PADTAD, London 07/2007 2: 5: 6: 7: getfield iconst_1 iadd putfield #2 #2 0: aload_0 1: getfield #2 4: ifnull 18 7: aload_0 8: aload_0 9: getfield #2 12:getfield #3 15:putfield #2 18:... 13 Demonstration of the localisation static class Flight { private int soldSeats; ... Flight(){ soldSeats = 0; ... } ... boolean bookTicket(){ soldSeats++; } ... } getfield iconst_1 iadd putfield PADTAD, London 07/2007 #2 #2 14 Agenda ● Introduction ● Self-Healing steps: – Problem detection – Problem localisation – Problem healing – Healing assurance ● Preliminary results and experiments ● Discussion PADTAD, London 07/2007 15 Healing atomicity races ● Influencing the scheduler – Forcing a context switch Thread.yield(); ● Idea: – ● Pros – ● To receive full time window from the scheduler. Safe and legal solution. Cons – Only decrease the probability of race manifestation. PADTAD, London 07/2007 16 Healing atomicity races ● Influencing scheduler – Temporary changes of the priorities Thread.setPriority(MAXPRIORITY); ... Thread.setPriority(originalPriority); ● Idea: – ● Pros – ● To receive full time window from the scheduler. Safe and legal solution. Cons – Only decrease the probability of race manifestation. – Strongly JVM and OS dependent. PADTAD, London 07/2007 17 Healing atomicity races ● Adding synchronization actions – Suitable use of mutexes healingMutex.lock(); ... healingMutex.unlock(); ● Idea: – ● Pros – ● To prevent accesses being simultaneous. Heal the race. Cons – Could introduce deadlock. – Exceptions must be covered. PADTAD, London 07/2007 18 Healing inherent races ● Distinguish between good and bad orders ● Enforce good order – ● Change the scheduling of the program. Override bad order – Concentrate on multiple write accesses. – Doesn't prevent bad order from occurring. PADTAD, London 07/2007 19 Demonstration of the healing static class Flight { private int soldSeats; ... Flight(){ soldSeats = 0; ... } ... boolean bookTicket(){ soldSeats++; } ... } Healing by influencing scheduler: boolean bookTicket(){ Thread.yield(); soldSeats++; } Healing by synchronization: boolean bookTicket(){ raceLock.lock(); soldSeats++; raceLock.unlock(); } PADTAD, London 07/2007 20 Agenda ● Introduction ● Self-Healing steps: – Problem detection – Problem localisation – Problem healing – Healing assurance ● Preliminary results and experiments ● Discussion PADTAD, London 07/2007 21 Healing assurance ● ● Static analysis and/or bounded model checking – Reduce false alarms during detection and localisation. – Ensure that a new bug cannot be introduced. – Help to choose suitable healing method. ... future work PADTAD, London 07/2007 22 Agenda ● Introduction ● Self-Healing steps: – Problem detection – Problem localisation – Problem healing – Healing assurance ● Preliminary results and experiments ● Discussion PADTAD, London 07/2007 23 Preliminary results ● ● Implemented race detector is able: – To detect wrong locking policy using Eraser algorithm. – To detect load-store atomicity bug pattern. – To localise the bug and give enough information to the developer. – To heal founded bug by influencing scheduler and also by introducing additional synchronization. Architecture available also as an Eclipse plug-in. PADTAD, London 07/2007 24 Experiments PADTAD, London 07/2007 25 Thank you PADTAD, London 07/2007 26 Eraser algorithm Candidate_locks(x) := Candidate_locks(x) ∩ Locks_held(x); if Candidate_locks(x) = { }, then issue a warning Candidate_locks = set of locks used to protect variable Locks_held = set of locks owned by thread PADTAD, London 07/2007 27 Architecture overview Original bytecode ... load x ... Instrumented bytecode ... BeforeVarRead load x AfterVarRead ... PADTAD, London 07/2007 28