AtomRace: Data Race and Atomicity Violation Detector and Healer Zdeněk Letko Tomáš Vojnar Bohuslav Křena Brno University of Technology, Czech Republic (EU) Zdeněk Letko, PADTAD 2008 1 / 29 Agenda ● AtomRace approach ● Detection abilities ● Problem exhibition technique ● Experimental results Zdeněk Letko, PADTAD 2008 2 / 29 What is a data race? ● A data race occurs when two concurrent threads access a shared variable and: – at least one access is a write and – the threads use no explicit mechanism to prevent the accesses from being simultaneous. Zdeněk Letko, PADTAD 2008 3 / 29 Lockset-based approach ● ● Instructions of interest: – Memory access. – Lock operations. Principle: – ● For each variable maintains a set of candidate locks. Disadvantages: – Support only synchronization based on locks. Zdeněk Letko, PADTAD 2008 4 / 29 Happens-before approach ● ● ● Instructions of interest: – Memory access. – Synchronization (posting, receiving). Principle: – Sync events split execution into segments. – Segments that are not related by the happens-before relation are simultaneous. Disadvantage: – Support only predefined synchronization. Zdeněk Letko, PADTAD 2008 5 / 29 Self-healing requirements ● Support for any kind of synchronization. ● No false alarms. ● Dynamic → minimal performance degradation. ● ● Never mind if race which does not occur within the current execution is not reported. Report problem before it happens. Zdeněk Letko, PADTAD 2008 6 / 29 AtomRace approach ● Instructions of interest: – ● Memory access. Every access to a variable is enclosed by meta instructions. beforeAccessEvent(p, loc6); p = null; afterAccessEvent(p, loc6); } Zdeněk Letko, PADTAD 2008 Primitive atomic section related to p 7 / 29 Data race detection ● ● A race is detected if two primitive atomic sections related to the same variable instance overlap and at least one access is for writing. Reports simultaneous accesses. Zdeněk Letko, PADTAD 2008 8 / 29 An example Thread 1 Thread 2 . . if (p != null){ p.doSomeWork(); } . . Time . . // set p to null p = null; . . . . . Zdeněk Letko, PADTAD 2008 9 / 29 An example Thread 1 Thread 2 . . if (p != null){ p.doSomeWork(); } . . Time . Problem . detected . p = null; . . . . . Zdeněk Letko, PADTAD 2008 10 / 29 An example Thread 1 Thread 2 . . if (p != null){ p.doSomeWork(); } . Time . . . . p = null; . . . . . Zdeněk Letko, PADTAD 2008 11 / 29 Atomicity Violation Thread 1 . . if (p != null){ . . . p.doSomeWork(); More general atomic section related to p } Zdeněk Letko, PADTAD 2008 12 / 29 More general atomic section ● Atomic section related to a variable: – ● If a thread t starts executing within the atomic section over a variable v, no other thread should access v in an unserializable way before the thread t reaches the end point of the atomic section. Serializability: – Each atomic region is executed atomically. – M. Xu, R. Bodík, and M. Hill. A serializability violation detector for shared-memory server programs. In SIGPLAN. 2005. Zdeněk Letko, PADTAD 2008 13 / 29 More general atomic section ● Serializability expressed by the set of access modes that are only allowed: A⊆{read , write} ● Associated with each end point of each atomic section. Zdeněk Letko, PADTAD 2008 14 / 29 Atomicity violation detection ● Within an atomic section gather a set of suspected accesses to a variable: beforeAccessEvent(p, loc3) if (p != null){ p.doSomeWork(); afterAccessEvent(p,loc4) } atomExitEvent(p, loc5) A = {read} A = {read, write} Zdeněk Letko, PADTAD 2008 15 / 29 Atomicity violation detection ● Within an atomic section gather a set of suspected accesses to a variable: beforeAccessEvent(p, loc3) if (p != null){ p.doSomeWork(); afterAccessEvent(p,loc4) } atomExitEvent(p, loc5) p = null; A = {read} A = {read, write} Zdeněk Letko, PADTAD 2008 16 / 29 Atomicity violation detection ● Within an atomic section gather a set of suspected accesses to a variable: beforeAccessEvent(p, loc3) if (p != null){ p = null; p.doSomeWork(); (Thread 2, write, loc6) afterAccessEvent(p,loc4) A = {read} } atomExitEvent(p, loc5) A = {read, write} Zdeněk Letko, PADTAD 2008 17 / 29 When not to detect - Java ● Safe to skip detection: – Variable declared as final ● – Cannot change the value. Variable declared as volatile ● Designed to tolerate simultaneous accesses. Zdeněk Letko, PADTAD 2008 18 / 29 When not to detect - Java ● Avoid false alarms: – Variable set within a <clinit> method ● ● – Initializes static variables – executed on demand. Synchronization ensured by the JVM. Example: static int p = 10; ... beforeAccessEvent(var, loc9); System.out.println(“Value of p is:” + p); afterAccessEvent(var, loc9); ... Zdeněk Letko, PADTAD 2008 19 / 29 Special situations ● Time Overlapping atomic sections − Multiple threads execute multiple atomic sections over a single variable. Example: Thread 1 . local = p; . Time Thread 2 . . local2 = p+10; . Zdeněk Letko, PADTAD 2008 20 / 29 Special situations Circular atomic sections − Section starts and ends at the same location within a loop. Example: . for (int i=0; i<5; i++){ . local += p; . } Zdeněk Letko, PADTAD 2008 21 / 29 Special situations Recursive atomic sections − Atomic section is entered recursively. Example: int method(int param){ if (param < limit) p = p + method(param+1); return param; } Zdeněk Letko, PADTAD 2008 ... 22 / 29 Obtaining atomicity ● ● Pattern based static analysis – Load and store bug patterns x++; – Test and use bug patterns if (p != null) p = p.next; Access Interleaving Invariants – S. Lu, J. Tucek, F. Qin, and Y. Zhou. Avio: detecting atomicity violations via access interleaving invariants. In ASPLOS-XII. 2006. ACM Press. Zdeněk Letko, PADTAD 2008 23 / 29 Problem manifestation ● How to make data races and atomicity violations manifest – so they can be detected? Zdeněk Letko, PADTAD 2008 24 / 29 Problem manifestation ● ● How to make data races and atomicity violations manifest – so they can be detected? Get use of noise injection: – ConTest tool (IBM Haifa Research Labs) ● – Synchronization focused Y. Ben-Asher, Y. Eytani, E. Farchi, and S. Ur. Noise makers need to know where to be silent - producing schedules that find bugs. ISOLA. 2006. Zdeněk Letko, PADTAD 2008 25 / 29 Noise injection ● ● ● Noise injected inside atomic sections. The noise is put inside atomic sections using before/afterAccessEvents → makes them longer. Noise injection policy: – Random – Variable focused – Location focused Zdeněk Letko, PADTAD 2008 26 / 29 AtomRace with Noise Injection ● 4 processors, TIDorb Java (1400+ classes) Conflicts Coverage of Execution time detected problematic variables Only AtomRace 1.58 0.37 5.76 sec Random 24.19 2.27 106.17 sec Location based 11.38 1.71 7.40 sec Variable based 261.38 3.63 34.44 sec Average in 100 runs, noise 50ms, noise frequency 50%. Zdeněk Letko, PADTAD 2008 27 / 29 Conclusions AtomRace − Supports any kind of synchronization. − Meets the requirements for dynamic self-healing. Problem manifestation technique − Increases the efficiency of dynamic detection technique – better if noise injected wisely. Future work − Performance, obtaining more general atomicity, healing the first occurrence of atomicity violation. Zdeněk Letko, PADTAD 2008 28 / 29 Thank you. Zdeněk Letko, PADTAD 2008 29 / 29 Proposed Architecture Zdeněk Letko, PADTAD 2008 30 / 29 AI-invariants: unserializable interleavings Interleaving scenario readlocal writeremote readlocal writelocal writeremote readlocal writelocal readremote writelocal readlocal writeremote writelocal Description The interleaving write makes the two reads have different views of the same memory locations. The local read does not get the local result it expetcts. Intermediate result that assumed to be invisible to other threads is read by a remote access. The local variable relies on a value from the preceding local read that is then overwritten by the remote write. Zdeněk Letko, PADTAD 2008 31 / 29