A"General"Approach"for"Efficiently" Accelera3ng"So6ware8based"Dynamic"Data" Flow"Tracking"on"Commodity"Hardware" Kangkook"Jee"" Columbia"University" Joint"work"with" Georgios"Portokalidis1","Vasileios"Kemerlis1","" Soumyadeep"Ghosh2,"David"August2,"Angelos"Keromy3s1" " 1Columbia"University,"2Princeton"University" 1" Talk"Outline" • Data"Flow"Tracking"[VEE"12]" – Popular"Subject"of"Security"Research" – Tagging"and"tracking"of"Interes3ng"data" • Taint"Flow"Algebra"(TFA)"[NDSS"12]" – Intermediate"Representa3on(IR)"for"DFT" – Compiler"op3miza3on"+"DFT"specific"op3miza3on" • ShadowReplica" – Decoupling"of"execu3on"and"monitoring" – Running"each"from"different"cores" 2" Talk"Outline" • Data"Flow"Tracking"[VEE"12]" – Popular"Subject"of"Security"Research" – Tagging"and"tracking"of"Interes3ng"data" • Taint"Flow"Algebra"(TFA)"[NDSS"12]" – Intermediate"Representa3on(IR)"for"DFT" – Compiler"op3miza3on"+"DFT"specific"op3miza3on" • ShadowReplica" – Decoupling"of"execu3on"and"monitoring" – Running"each"from"different"cores" 3" Data"Flow"Tracking"(DFT)" • A"great"security"tool"with"many"applica3ons" – Tag"input"data"and"track"them" – So6ware"exploits,"Informa3on"misuse"or"leakage" malware"analysis"…" • Implementa3on"approaches" – Hardware"assisted:"Raksha,"RIFLE"…" – Source"code"based:"GIFT"…" – Binary"only:"TaintCheck,"Dytan,"Minemu,"Libd6"…" "Binary"only"DFT:"Most"promising,"but"too"slow!" 4" DFT:"Basic"Aspects" • DFT"is"characterized"by"three"aspects"" (1) Data'Sources:"program"or"memory"loca3ons"where"data" of"interest"enter"the"system"and"is"subsequently"tagged" (2) Data'tracking:"process"of"propaga3ng"data"tags" according"to"the"program’s"seman3cs" (3) Data'Sinks:"program"or"memory"loca3ons"where"checks" for"“tagged”"data"can"be"made" Shadow Memory (3) (1) Input File Network Keyboard Source mov eax, [ebx] (2) mov [esi], eax Data Tracking Output File Network 5" DFT"Opera3on" Real Memory • Real"Memory"="Address"space"+"register"context" 6" DFT"Opera3on" Real Memory Shadow Memory • Real"Memory"="Address"space"+"register"context" • Shadow"memory"to"track"metadata"update""" 7" DFT"Opera3on" dst[idx1] = src[idx0]; Real Memory Shadow Memory • Memory"copy"statement"from"the"original"execu3on" 8" DFT"Opera3on" dst[idx1] = src[idx0]; Real Memory t(dst[idx1]) = t(src[idx0]); Shadow Memory • Memory"copy"statement"from"the"original"execu3on" • Corresponding"shadow"memory"update" 9" DFT"Opera3on" dst[idx1] = src[idx0]; mov reg0 ← [src+idx0] Real Memory t(dst[idx1]) = t(src[idx0]); mov [dst+idx1] ← reg0 Shadow Memory • Original"opera3on"translated"into"machine"code"" • It"requires"intermediate"register"repository"(reg0)" 10" DFT"Opera3on" dst[idx1] = src[idx0]; mov reg0 ← [src+idx0] mov reg0 ← [t(src+idx0)] mov [t(reg0)] ← reg0 Real Memory t(dst[idx1]) = t(src[idx0]); mov [dst+idx1] ← reg0 Shadow Memory • Instruc3on"level"instrumenta3on"to"implement" shadow"update" 11" DFT"Opera3on" dst[idx1] = src[idx0]; mov reg0 ← [src+idx0] mov reg0 ← [t(src+idx0)] mov [t(reg0)] ← reg0 Real Memory t(dst[idx1]) = t(src[idx0]); mov [dst+idx1] ← reg0 mov reg0 ← [t(reg0)] mov [t(dst+idx1)] ← mov reg0 Shadow Memory • 2"original"instruc3ons"+"4"tracking"instruc3ons" • 2"instrumenta3on"units" 12" Why"So"Slow?" • Framework"cost"(virtualiza3on"cost)" – DBI,"Hypervisor"instrumenta3on" • DFT"cost" – Accesses"to"shadow"storage" • Naïve"Implementa3on" – No"understanding"of"global"context" – No"understanding"of"DFT"seman3cs" 13" Talk"Outline" • Data"Flow"Tracking" – Popular"Subject"of"Security"Research" – Tagging"and"tracking"of"Interes3ng"data" • Taint"Flow"Algebra"(TFA)" – Intermediate"Representa3on(IR)"for"DFT" – Compiler"op3miza3on"+"DFT"specific"op3miza3on" • ShadowReplica" – Decoupling"of"execu3on"and"monitoring" – Running"each"from"different"cores" 14" Taint"Flow"Algebra" • Applica3on"specific"analysis" • DFT"specific"analysis" • Integrated"with"libd)* – High"performance"DFT"tool""[VEE"2012]"" • 1.46x"~"8x"slowdown"(over"na3ve"execu3on)" – Designed"for"use"with"Pin*DBI*framework*" – Open"source" • hsp://www.cs.columbia.edu/~vpk/research/libd6" 15" Op3mizing"DFT" Original" mov reg0 ← [src+idx0] mov reg0 ← [t(src+idx0)] mov [t(reg0)] ← reg0 Instrumenta3on" mov [dst+idx1] ← reg0 mov reg0 ← [t(reg0)] mov [t(dst+idx1)] ← mov reg0 • Each"Instrumenta3on"unit"requires"head/tail"instruc3ons" • t(*)*:"shadow"memory"access"cost"" 16" Op3mizing"DFT" Original" mov reg0 ← [src+idx0] mov reg0 ← [t(src+idx0)] mov [t(reg0)] ← reg0 Instrumenta3on" mov [dst+idx1] ← reg0 mov reg0 ← [t(reg0)] mov [t(dst+idx1)] ← mov reg0 • Re8locatable" 17" Op3mizing"DFT" mov reg0 ← [src+idx0] mov [dst+idx1] ← reg0 mov reg0 ← [t(src+idx0)] mov [t(reg0)] ← reg0 mov reg0 ← [t(reg0)] mov [t(dst+idx1)] ← mov reg0 • Less"instrumenta3on"units"(2!1)" 18" Op3mizing"DFT" mov reg0 ← [src+idx0] mov [dst+idx1] ← reg0 mov reg0 ← [t(src+idx0)] mov [t(reg0)] ← reg0 mov reg0 ← [t(reg0)] mov [t(dst+idx1)] ← mov reg0 • Less"instrumenta3on"units"(2!1)" • Less"tracking"instruc3ons"(4!2)" 19" Execu3on"Model" • 3"Components" – "Profiler,"Analyzer,"DFT"Run3me" • Sta3c/offline"analysis"+"Dynamic"run3me" – Feedback"loop" Optimized data tracking DFT Runtime Control flow information Dynamic profiler Analyzer Basic blocks Unprocessed basic blocks Static Profiler 20" Analyzer" • Taint"Flow"Algebra" – Represent"binary"analysis"result" – IR"tailored"to"capture"DFT"seman3cs" • Compiler"op3miza3on"to"TFA" – Inner"(intra)"basic"block:" "Dead"code"elimina3on,"Algebraic"simplifica3on,"…" – Outer"(inter)"basic"block:"" "Data"flow"analysis" • DFT"specific"considera3ons" – Valid"loca3on"for"each"instrumenta3on"unit" – Number"of"instrumenta3on"units" 21" TFA"Op3miza3on" 1:#mov#ecx,#esi 2:#movzxb#eax,#al 3:#shl#ecx,#0x5 4:#add#edx,0x1 5:#lea#esi,#ptr#[ecx+esi] 6:#lea#esi,#ptr#[eax+esi] 7:#movzxb#eax,#ptr#[edx+esi]###### 8:#testb#al,#al 9:#jnzb#0xb7890200 (a) x86 instruction • Per"basic"block"analysis" • Gray"instruc3ons:"non8tracking"instruc3ons" 22" TFA"Op3miza3on" 1:#mov#ecx,#esi 2:#movzxb#eax,#al 3:#shl#ecx,#0x5 4:#add#edx,0x1 5:#lea#esi,#ptr#[ecx+esi] 6:#lea#esi,#ptr#[eax+esi] 7:#movzxb#eax,#ptr#[edx+esi]###### 8:#testb#al,#al 9:#jnzb#0xb7890200 (a) x86 instruction 1:#ecx1#:=#esi0 2:#eax1#:=#0x1#&#eax0 3:# 4:# 5:#esi1#:=#ecx1#|#esi0 6:#esi2#:=#eax1#|#esi1 7:#eax2#:=#0x1#&#[edx0+esi2] 8:# 9:# (b) TFA transformation • Translated"into"TFA" • Input"operands,"output"operands" 23" TFA"Op3miza3on" 1:#mov#ecx,#esi 2:#movzxb#eax,#al 3:#shl#ecx,#0x5 4:#add#edx,0x1 5:#lea#esi,#ptr#[ecx+esi] 6:#lea#esi,#ptr#[eax+esi] 7:#movzxb#eax,#ptr#[edx+esi]###### 8:#testb#al,#al 9:#jnzb#0xb7890200 (a) x86 instruction 1:#ecx1#:=#esi0 2:#eax1#:=#0x1#&#eax0 3:# 4:# 5:#esi1#:=#ecx1#|#esi0 6:#esi2#:=#eax1#|#esi1 7:#eax2#:=#0x1#&#[edx0+esi2] 8:# 9:# 1:#ecx1#:=#esi0 2: 3:# 4:# 5:# 6:#esi2#:=#0x1#&#eax0#|esi0 7:#eax2#:=#0x1#&#[edx0+esi2] 8:# 9:# (b) TFA transformation (c) TFA optimization • Output"operands"are"expressed"in"terms"of"input" operands" • Data"flow"analysis"to"remove"irrelevant"outputs" 24" TFA"Op3miza3on" 1:#mov#ecx,#esi 2:#movzxb#eax,#al 3:#shl#ecx,#0x5 4:#add#edx,0x1 5:#lea#esi,#ptr#[ecx+esi] 6:#lea#esi,#ptr#[eax+esi] 7:#movzxb#eax,#ptr#[edx+esi]###### 8:#testb#al,#al 9:#jnzb#0xb7890200 | & eax0 esi2 eax1 0x1 & & esi0 esi1 [edx0+esi2] eax2 0x1 ecx1 (a) x86 instruction DAG Representation • DAG"Representa3on" • Express"root"nodes"in"terms"of"leaf"nodes" 25" DFT"Run3me" • Generate/Inject"op3mized"tracking"code"to" the"baseline"DFT"plaworm" – Translate"op3mized"TFA"" • Our"prototype"extends"libd6"" • Code"genera3on"of"libd6/PIN8aware"C"code" – A"func3on"per"each"instrumenta3on"unit" – e.g.,"Firefox:"50K"customized"func3ons"" 26" Evalua3on" • Op3miza3on"schemes" – Code"reduc3on:"Simple"dead"code"elimina3ons" • Inner,"Outer" – Code"genera3on:"Op3mized"tracking"codes" – TFA"Scaser,"TFA"Aggrega3on" Category' CFG' Considera9on' TFA' Op9miza9on' Aggrega9on' Code"reduc3on" Inner" No" No" No" Outer" Yes" No" No" Scaser" Yes" Yes" No" Aggrega3on" Yes" Yes" Yes" Code" genera3on" Op9miza9on' schemes' 27" Evalua3on:"SPEC"CPU2000" 12 libdft Inner Outer TFA scatter TFA aggr 10.5 Slowdown (Normalized) 9 7.5 6 4.5 3 1.5 0 crafty eon gap gcc mcf parserperlbmk twolf vortex vpr average • CPU"intensive"workloads" • TFA’s"speedup"over"libd6:""on"average"1.90x"(the"largest"2.23x)" • ~3x"slowdown"over"the"na3ve"execu3on" " 28" Evalua3on:"Server"applica3ons" (a) MySQL libdft Inner Outer TFA scatter TFA aggr 7 24 6 5 4 3 20 16 12 8 2 1 libdft Inner Outer TFA scatter TFA aggr 28 Slowdown (normalized) Slowdown (normalized) (b) PHP create alter insert ATIS Test suite 4 casing md5 sha1 average PHPBench Benchmark • Mysql’s"own"benchmark"suite"(sql8bench)"and"PHP"micro"benchmark"suite" (PHPBench)" – Plosed"representa3ve"subsets" 29" Evalua3on:"Client"Applica3ons" (a) Web site rendering 11 libdft Inner Outer TFA scatter TFA aggr 10 8 7 6 5 4 libdft Inner Outer TFA scatter TFA aggr 18 Slowdown (normalized) 9 Slowdown (normalized) (b) Javascript 15 12 9 3 6 2 1 Gmail NDSS Youtube Facebook Web site • firefox chrome Browser Rendering"measurement"for"Alexa’s"Top"500"sites"and"NDSS"2012"site" – For"Firefox"web8browser" • Dromaeo"(hsp://www.dromaeo.com)"Javascript"benchmark"suite" – For"Firefox"and"Google"Chrome"web8browser" 30" Talk"Outline" • Data"Flow"Tracking"[VEE"12]" – Popular"Subject"of"Security"Research" – Tagging"and"tracking"of"Interes3ng"data" • Taint"Flow"Algebra"(TFA)"[NDSS"12]" – Intermediate"Representa3on(IR)"for"DFT" – Compiler"op3miza3on"+"DFT"specific"op3miza3on" • ShadowReplica" – Decoupling"of"execu3on"and"monitoring" – Running"each"from"different"cores" 31" ShadowReplica:"Parallelized"DFT" • Basic"idea" – Offload"tracking"overhead"to" the"separate"core" Application • Challenge" – Communica3on"cost"is"too" high" – Synchroniza3on" • Approach" – Minimize"the"overhead"to"the" primary" – Maximize"the"u3liza3on"of"the" secondary" Collect Events Ring"buffer" Enqueue Dequeue Analyze 32" ShadowReplica:"Execu3on"Model" • The"block"0x1234"" The"Primary" Block"ID:" 0x1234" – Executed"from"the"primary"" – Monitored"from"the" secondary" Application Collect Events Ring"buffer" Enqueue Dequeue Analyze The"Secondary" =="Block"ID:"0x1234==" ecx1":="esi0" esi1":="[ebp0"–"0x4"]" edx1":="[ebp0"–"0x14]" esi2":="0x1"&"eax0"|"esi1" eax2":="0x1"&"[edx1"+"esi2]" 33" ShadowReplica:"Run3me"Informa3on" The"Primary" Block"ID:" 0x1234,"" [ebp0"8"0x4], [ebp0"–"0x14],"" [edx1"+"esi2]" • The"Primary"enqueues" – Effec3ve"Memory" Addresses" – Block"Iden3fiers" Application Collect Events Ring"buffer" Enqueue Dequeue Analyze The"Secondary" =="Block"ID:"0x1234"==" ecx1":="esi0" esi1":="[ebp0'–'0x4']' edx1":="[ebp0'–'0x14]' esi2":="0x1"&"eax0"|"esi1" eax2":="0x1"&"[edx1'+'esi2]' 34" ShadowReplica:"Run3me"Informa3on" The"Primary" Block"ID:" 0x1234,"" [ebp0"8"0x4], [ebp0"–"0x14],"" [edx1"+"esi2]" • The"Primary"enqueues" – Effec3ve"Memory" Addresses" – Block"Iden3fiers" Application Collect Events Ring"buffer" Enqueue Dequeue Analyze The"Secondary" =="Block"ID:"0x1234"==" ecx1":="esi0" esi1":="[ebp0'–'0x4']' edx1":="[ebp0'–'0x14]' esi2":="0x1"&"eax0"|"esi1" eax2":="0x1"&"[edx1'+'esi2]' 35" Op3mizing"Effec3ve"Addresses" • Intra"Block"Op3miza3on" – Linear"Lower"Bound" • 3"Register"Variables:"" eax0,"eax1,"ebx0" • 4"Memory"Operands:"" One"redundancy" Block"ID:"0x1234" ![eax0!+!1!×!ebx0] ![eax1!+!2!×!ebx0] ![ebx0!+!4!×!eax1] ![eax0!+!2!×!ebx0]! 36" Op3mizing"the"Primary:"" Effec3ve"Addresses" Block"ID:"0x1234" • Intra"Block"Op3miza3on" ![eax0!+!1!×!ebx0] ![eax1!+!2!×!ebx0] ![ebx0!+!4!×!eax1] ![eax0!+!2!×!ebx0]! – Linear"Lower"Bound" • 3"Register"Variables:"" eax0,"eax1,"ebx0" • 4"Memory"Operands:"" One"redundancy" • Inter"Block"Op3miza3on" – Data"Flow"Analysis" – Use8Def"equa3on"based"on" Inferablity* BB1 [eax^ + ebx^] [ebx^] [eax^] BB2 [eax0 + ebx0] *"‘^’"indicates"top8most"version"for"the"register"variable"in"the"block" BB0 37" Op3mizing"the"Primary":" Block"Iden3fiers" • Minimize"Block"ID"trace" leveraging" Block 0 10000 Block 1 10000 Block 2 9999 Block 3 1 Block 4 – Control"Flow"Graph"(CFG)""" – Execu3on"count" • Only"Block"4"require" instrumenta3on" – 1"instrumenta3on"&"1" execu3on" – Rest"can"be"restored" from"the"secondary" 38" Op3mizing"Block"Iden3fiers" • Minimize"Block"ID"trace" leveraging" Block 0 10000 Block 1 10000 Block 2 9999 Block 3 1 Block 4 – Control"Flow"Graph"(CFG)""" – Execu3on"count" • Only"Block"4"require" instrumenta3on" – 1"instrumenta3on"&"1" execu3on" – Rest"can"be"restored" from"the"secondary" 39" Preliminary"Evalua3on:"" Bzip2"Compression" 7.73x 7 Slowdown 6 5 4.11x 4 3.45x 2.5x 3 2.2x 2 1.27x 1 libdft TFA NO_OPT INTRA INTER • The"primary"performs"Bzip2" compression"against"a"Linux" kernel" • The"cost"of"dumping"EAes" and"Block"trace"from"the" primary" • Instrumenta3on"for"the" primary"is"implemented"" using"PIN"DBI" NULL_PIN Implementations • NO_OPT:"All"EAes"and"BB"traces" • INTRA:"Intra"Block"op3miza3on" • INTER:"Intra"+"Inter"Block"op3miza3on" 40" Candidate"Analyses"for"ShadowReplica" Analyses" Shadow"Memory" Data"Dependency" Checking" Data"Flow"Tracking" ✓" ✓" ✓" MemCheck" ✓" ✓" ✓" LockCheck" ✓" ✗" ✓" Method"Coun3ng" ✗" ✗" ✗" Call"Graph"Profiling" ✗' ✗' ✗' Path"Profiling" ✗" ✗" ✗" Cache"Simula3on" ✓" ✗" ✗" • Each"analysis"have"different"performance"implica3ons" – Instrumenta3on"frequencies,"size"of"analysis"rou3nes"" • Data"dependency"across"mul3ple"updates"inhibits"the"paralleliza3on"of" analysis" • Checking"opera3ons"requires"synchroniza3on"between"an"applica3on"and"an" 41" analysis" Conclusion" • Current"binary8only"DFT"implementa3ons"are"sub" op3mal" – No"considera3on"for"DFT"seman3cs" – No"considera3on"for"global"context" • Proposed"a"novel"approaches"scheme"that" – Combines"sta3c"and"dynamic"analysis" – Segregates"execu3on"and"tracking"logic" • Huge"Speedups"for"real8world"applica3ons" – TFA:"~2x" – Shadow"Replica:"Constant"upper"bound"to"the" overhead" 42"