Amiram Yehudai Shmuel Tyszberowicz Dor Nir Problem introduction. Regression bug – definition. Proposed solution. Experimental results. Future work. Given existence of a bug, we want to locate the place in the source code that causes the bug. Bug Location Bug Found Locating Command Command: Click on the button. Enter a text to an edit box. Insert a record to DB. … Command Command Checkpoint Checkpoint: Check that the button is enabled. Check that a certain text is shown in the edit box. Check the value in a record of a DB. … Command Command Regression bug Specifications 1. X 2. Y 3. Z Version 1 R el e a s e Changes in code Version 2 Bug… But no regression Specifications 1. X 2. Y 3. Z 4. A 5. B Regression bugs occur whenever software functionality that previously worked as desired stops working, or no longer works as planned. Typically regression bugs occur as an unintended consequence of program changes. What is the cause for the regression bug? Version 1 Changes in code Version 2 What is the change that causes the regression bug? Check C – A checkpoint that failed when point running a test-case. V - last version of the AUT where Version 1 checkpoint C still passed when running the test-case. We want to find in the source code of the AUT the locations p1 , p2 ... pn that caused C to fail. Input Failed Check Point Last Passed The code psychologist tool Heuristic SCT Change 1 Change 2 Change 3 Change 4 Change 5 … changes Change Sound Filter Heuristic Heuristic Version First phase Second phase Output Relevant changes: 1. Change n1 2. Change n2 3. Change n3 … The code psychologist tool Heuristic S C T C S F Heuristic Heuristic Data Base of source code. Very common in software development. Check-in / Check-out operation. History of versions. Differences between versions. Retrieve changes submitted after version V. Amount of retrieved changes can be large. The code psychologist tool Heuristic S C T C S F Heuristic Heuristic Retrieving relevant changes. Soundness– The output of the CSF must contains the changes that cause the regression bug. Tests Check text in message box File t.xml was Created successfully “SELECT NAMES from Table1” is not empty Source code Windows.cpp errMessages.cpp File.cs IO.cs C:\code\windows DB project Filtering refactoring changes. Changes in comments. Using profiler information to filter irrelevant changes. Code that was not executed could not cause the regression. The code psychologist tool Heuristic S C T C S F Heuristic Heuristic Rank changes. Not conservative. Each heuristic has different weight. Rank ( p ) i |H | i HeuristicRanki ( p ) i 1 2 1 3 Affinity – “Close connection marked by similarity in nature or character” Measure affinity between words. chair house < chair table < chair chair Object WrdAff(a,b) artifact instrumentality article Conveyance, transport ware vehicle tableware wheeled vehicle Automotive, motor bike, bicycle 1 Distance(a,b) 1 WrdAf f(bike,fork ) 10 cutlery, eating utensil fork WrdAff(a,b) 1 Distance(a,b) AsyGrpAff(A,B ) GrpAff 1 n n max{WrdAff(ai ,b j ) | 1 j i 1 (AsyGrpAff ( A,B) AsyGrpAff ( B,A))/2 m} The code psychologist tool Heuristic S C T Code Lines Affinity. Check-in comment affinity. File Affinity. Function Affinity. C S F Heuristic Heuristic The code psychologist tool Heuristic Human factor: Programmers history. Time of change. Late at night. Close to release deadline. Code complexity Number of Branches. Concurrency. S C T C S F Heuristic Heuristic C++. MFC framework. 891 files in 29 folders. 3 millions lines of code. Visual source safe. 3984 check-ins. Results: Bug Code Lines Heuristic Check-in Average File Affinity Functions Simple Weighted 1 5 3 9 1 1 1 2 - 1 24 - 7 3 3 5 3 3 1 1 1 4 - - - 6 6 5 5 2 1 4 1 4 1 Results with file grouping: Bug Heuristic Code Lines Check-in Average File Affinity Functions Simple Weighted 1 1 3 9 1 1 1 2 9 1 24 - 3 2 3 3 3 3 2 1 1 4 - - - 3 9 4 5 1 1 4 1 1 1 Locating the bug took 20 hours of strenuous work of two experienced programmers. Fixing the bug took less then an hour. Heuristic Rank (group by file) Code Line Affinity 7 Check-in comment Affinity - File Affinity 22 Function Affinity 8 Average 4 Implementing the human factor and the code complexity heuristics. Learning mechanism – Automatic tuning of heuristics. More experiments on “real world” regression bugs. The code psychologist tool Heuristic S C T Code line affinity: Rank1 (C,P) GrpAff(W(C ), W( P)) C S F L 1 L Heuristic Heuristic GrpAff(W(C ), W( P,l )) l 1 W (P, L) = Group of words in the source code located L lines from the change P. – coefficient that gives different weight for lines inside the change. Check-in comment affinity: Rank 2 (C,P) GrpAff(W(C ), W(Checkin( P))) The code psychologist tool Heuristic S C T File affinity: maxAff(a,B,map) Rank 3 (C,P ) Heuristic Heuristic max{WrdAff(a,bi ) map[bi ] | 1 i n HstAff( A,B,map) C S F MaxAff(ai ,B ) i 1 n max{map[b j ] | 1 j m} HstAff(W(C ), W( F ), Hstg( F )) m} The code psychologist tool Heuristic S C T Function affinity: FuncAff(C,f ) k k k i 1 Rank 4 (C,P ) C S F Heuristic Heuristic GrpAff(W(C ), W( f )) GrpAff(W(C ), Bdy( f )) 1 k FncAff(C , FncCall( f,i )) FuncAff(C , func( P)) Check point Select "clerk 1" from the clerk tree (clerk number 2). Go to the next clerk. The next clerk is "clerk 3"