security analysis with WALA Omer Tripp IBM TJ Watson Workshop on WALA (WoW), 13 June 2015 collaborators • Marco Pistoia • Patrick Cousot • Radhia Cousot • Julian Dolby • Manu Sridharan • Pietro Ferrrara • Steve Fink • Sal Guarnieri • Takaaki Tateishi • Omri Weisman • Steve Teilhet • Ryan Berg • Shay Artzi • Lotem Guy • Roee Hay motivation threat evolution 1960 Program installation only by experts 1970 1980 Program sharing 1990 Program downloading 2000 Mobile code Web applications 2010 Mobile applications threats: XSS Attacker’s evil script <SCRIPT>...</SCRIPT> Web Application Attacker’s evil script executed using victim’s credentials Attacker Victim threats: SQLi String query = “SELECT * FROM users WHERE name=‘” + userName + “’ AND pwd=‘” + pwd + “’”; SELECT * FROM users WHERE name='jsmith' AND pwd='Demo1234' Ouch! SELECT * FROM users WHERE name='foo';drop table custid;--' AND pwd='' threats: info leakage threats: JS 1515 %% of Fortune 500 websites have exploitable security issues in JavaScript. According to an IBM study performed in 2010 DOM-based XSS document.write(document.URL.substring( pos,document.URL.length)); Open Redirect var pos = document.location.href.indexOf("name="); var val = document.location.href.substring(pos); document.location.href = "http://" + val; threats: mobile * http://blackdiamondsolutions.com/wp-content/uploads/2013/04/state-of-software-security-report-volume5.pdf owasp top ten 1. Cross-site scripting (XSS) 2. Injection flaws 3. Malicious file executions 4. Insecure direct object reference 5. Cross site request forgery (CSRF) 6. Information leakage and improper error handling 7. Broken authentication and improper session management 8. Unsecure cryptographic storage 9. Unsecure communications 10. Failure to restrict URL accesses outline of our work the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks the high-level progression 2009 TAJ 2010 2011 PISA 2012 F4F 2013 Andromeda 2014 JSA server-side security verification TAJ (PLDI’09): Java taint analysis sti Store statement Hybridcomputed SDG based on li Load statement st1 ski Sink-dispatch statement ci Call statement ri Return statement si Other statement l1 l2 st3 st4 Store-to-load direct edge computed using graph c1 reachability over a noheap SDG c5 Slice in the no-heap SDG r5 r1 l3 l4 l5 sk1 st5 c4 r4 c2 c3 s1 r3 s2 r7 r2 r8 st4 Load-to-store or load- to-sink summary edge No-heap SDG edge l2 preliminary pointer analysis st2 sk2 TAJ: design choices ● ● ● ● pointer analysis: variant of Andersen’s analysis custom context-sensitivity policy: ● unlimited-depth object sensitivity for Java collections (up to recursion) ● one level of call-string context for factory methods ● one level of call-string context for taint APIs ● one-level receiver-object context sensitivity as default field sensitivity flow insensitive, though intraprocedurally flow sensitive thanks to SSA TAJ: WALA support ● ● ● ● ● ● ● pointer analysis / context sensitivity field sensitivity synthetic IR for taint APIs SSA representation IFDS enhancements [contributed to WALA] ICFGSupergraph [contributed to WALA] .NET support [contributed to WALA] IBM AppScan Developer Edition Andromeda (FASE’13): demand-driven taint analysis Andromeda: WALA support ● ● PA-free CG construction algs for Java (CHA, RTA, …) efficient infrastructure for fixpoint iteration IBM AppScan Source Edition the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks framework modeling F4F (OOPSLA’13): modeling Java web frameworks effect of modeling F4F: WALA support ● ● ● synthetic IR for call-graph methods customization of call-graph target resolution (CustomContextSelector) various knobs to tweak in pointer analysis IBM AppScan Source Edition the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks sanitizer verification PISA (ISSTA’11, TOSEM’13): encoding into M2L PISA: sanitizers from the wild PISA: WALA support ● ● ● uniform IR: JS, Java, … pointer analysis: aliasing SSA IBM AppScan Source Edition the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks client-side security verification Actarus (ISSTA’11): JS taint analysis NY times WordPress Actarus: handling aliases Actarus: WALA support ● ● JS IR, including ● property accesses ● arguments array ● lexical scoping ● function pointers novel CG construction algs for JS [contributed to WALA] IBM AppScan Source Edition the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks handling dynamic aspects of JS JSA (ISSTA’14): HTML/JavaScript, concrete URLs, … black-box scanner issues DOM modeling reduce scope taint analysis find issues string analysis eliminate false positives JSA (ISSTA’14): ▪specialized string analysis using dynamic pieces of information (e.g., concrete URL) "https://some-site/release/jsp/sso/login.html?..." var str = document.URL; var url_check = str.indexOf('login.html'); if (url_check > -1) { result = str.substring(0,url_check); result = result + 'login.jsp' + str.substring((url_check+search_term.length), str.length); document.URL = result; } ▪part controlled by attacker is unknown, but known prefix modeled precisely URL as Source http://www.mysite.com/folder/page?a=1&b=2#anchor NOT CONTROLLED BY ATTACKER CONTROLLED BY ATTACKER JSA: WALA support ● ● call-graph construction IFDS / RHS (proved general enough!) ● ability to leverage domain knowledge ● CallNoneToReturnFlow as a means to plug in models IBM AppScan Standard Edition the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks the high-level pieces dynamically loaded client-side code statically loaded client-side code server-side code security defenses: sanitizers and validators libraries and frameworks nowadays PL / ML PL / ML PL / ML PL / ML PL / ML http://malwaredetection.mybluemix.net/analyzer/ thoughts into the future • query-based analyses • more WALA support around IFDS / IDE (already underway!!) • integration with machine-learning engines like WEKA • integration with dynamic analyses thank you!