Security Analysis of Malicious LivingSocialbots in the (malicious) on thesocial Web web: Beyond friendships Yazan Boshmaf Yazan Boshmaf, Konstantin Beznosov, Matei Ripeanu, Dionysions Logothetis, Georgios Siganos, Jose Lorenzo Dissertation presented in partial fulfillment of degree requirements of PhD in ECE, UBC 1 Social bots Automated fake accounts in online social networks (OSNs) + = Designed to deceive and appear human Hwang et al. Socialbots: Voices from the fronts. ACM Interactions 19, 2 (March 2012), 38-45. 2 The threat of malicious social bots Automated fake accounts in online social networks (OSNs) What is at stake? + = Designed to deceive and appear human Hwang et al. Socialbots: Voices from the fronts. ACM Interactions 19, 2 (March 2012), 38-45. 3 Fake accounts are bad for business “… If advertisers, developers, or investors do not perceive our user metrics to be accurate representations of our user base, or if we discover material inaccuracies in our user metrics, our reputation may be harmed and advertisers and developers may be less willing to allocate their budgets or resources to Facebook, which could negatively affect our business and financial results…” 4 Fake accounts are bad for users OSNs are attractive medium for abusive users Social Infiltration Connecting with many benign users (friend request spam) Bilge et al. All your contacts are belong to us: Automated identity theft attacks on social networks. Proc. of WWW, 2009 5 Fake accounts are bad for users OSNs are attractive medium for abusive users Social Infiltration Data collection Online surveillance, profiling, and data commoditization Nolan et al. Hacking human: Data-archaeology and surveillance in social networks. ACM SIGGROUP Bulletin 25.2, 2005 6 Fake accounts are bad for users OSNs are attractive medium for abusive users Social Infiltration Data collection Misinformation Influencing users, biasing public opinion, propaganda Ratkiewicz et al. Detecting and tracking political abuse in social media. Proc. of ICWSM. 2011 7 Fake accounts are bad for users OSNs are attractive medium for abusive users Social Infiltration Data collection Misinformation Malware Infection Infecting computers and use it for DDoS, spamming, and fraud Thomas et al. The Koobface botnet and the rise of social malware. Proc. of MALWARE, 2010 8 Fake accounts are bad for users Our work OSNs are attractive medium for abusive content Threat characterization Social Infiltration Data collection Countermeasure design Misinformation Malware Infection Infecting computers and use it for DDoS, spamming, and fraud1 1 Thomas et al. The Koobface botnet and the rise of social malware. Proc. of MALWARE, 2010. 9 Questions 2 1 • Vulnerability analysis • Characterization of user behavior 1 How vulnerable are OSNs to social infiltration? 4 1 What are the security and privacy implications of social infiltration? •Quantification of privacy breaches •Effectiveness of security defenses •Scalability from economic context •Profit-maximizing infiltration strategy 3 1 How can OSNs detect fakes or social bots that infiltrate on a large scale? •Victim prediction for robust detection •Framework for evaluation What is the economic rationale behind infiltrating OSNs at scale? 10 Questions 2 1 • Vulnerability analysis • Characterization of user behavior 1 How vulnerable are OSNs to social infiltration? What are the security and privacy implications of social infiltration? • Quantifying privacy breaches • Effectiveness of security defenses 4 1 •Scalability from economic context •Profit-maximizing infiltration strategy 3 1 How can OSNs detect fakes or social bots that infiltrate on a large scale? •Victim prediction for robust detection •Framework for evaluation What is the economic rationale behind infiltrating OSNs at scale? 11 Questions 2 1 • Vulnerability analysis • Characterization of user behavior 1 How vulnerable are OSNs to social infiltration? 4 1 What are the security and privacy implications of social infiltration? • Quantifying privacy breaches • Effectiveness of security defenses • Scalability in economic context • Profit-maximizing infiltration strategy How can OSNs detect fakes or social bots that infiltrate on a large scale? •Victim prediction for robust detection •Framework for evaluation 3 1 What is the economic rationale behind infiltrating OSNs at scale? 12 Questions Countermeasure Design Threat Characterization 2 1 •Vulnerability analysis of OSN platforms •Characterization of user behavior 1 How vulnerable are OSNs to social infiltration? 4 1 What are the security and privacy implications of social infiltration? •Quantification of privacy breaches •Effectiveness of security defenses •Scalability from economic context •Profit-maximizing infiltration strategy 3 1 What is the economic rationale behind infiltrating OSNs at scale? How to detect social bots that infiltrate on a large scale? •Is victim prediction feasible •Can victim prediction enable robust detection 13 Attack side: Social infiltration in OSNs Threat Characterization 2 1 •Vulnerability analysis of OSN platforms •Characterization of user behavior 1 How vulnerable are OSNs to social infiltration? 1 2 3 4 1 What are the security and privacy implications of social infiltration? •Quantification of privacy breaches •Effectiveness of security defenses •Scalability from economic context •Profit-maximizing infiltration strategy 3 1 How can OSNs detect fakes or social bots that infiltrate on a large scale? •Victim prediction for robust detection •Framework for evaluation What is the economic rationale behind infiltrating OSNs at scale? The socialbot network: When bots socialize for fame and money, Boshmaf, Beznosov, Ripeanu, ACSAC, Dec 2011 Key challenges in defending against malicious socialbots, Boshmaf, Beznosov, Ripeanu, USENIX LEET, April 2012 Design and analysis of a social botnet, Boshmaf, Beznosov, Ripeanu, J. Comp. Net., 57(2), Feb 2013 14 Social botnet: Experiment Operated 100 socialbots on Facebook, single botmaster Bots sent 9.6K friend requests send in 8 weeks, 35.7% requests from bots accepted (victims) 15 Main findings (Platform-level vulnerability) 2 1 It is feasible to automate social infiltration by exploiting platform and user vulnerabilities •Vulnerability analysis of OSN platforms •Characterization of user behavior 1 4 1 What are the security and privacy implications of social infiltration? •Effectiveness of security defenses •Quantification of privacy breaches •Scalability from economic context •Profit-maximizing infiltration strategy 3 1 How can OSNs detect fakes or social bots that infiltrate on a large scale? •Systematic evaluation •Robust detection technique What is the economic rational behind infiltration OSNs at scale? How vulnerable are OSNs to social infiltration? Threat Characterization 16 Main findings (Data breaches) Threat Characterization 2 1 Social infiltration results in serious privacy breaches, where personally identifiable information is compromised •Vulnerability analysis of OSN platforms •Characterization of user behavior 1 4 1 How vulnerable are OSNs to social infiltration? What are the security and privacy implications of social infiltration? •Effectiveness of security defenses •Quantification of privacy breaches •Scalability from economic context •Profit-maximizing infiltration strategy 3 1 How can OSNs detect fakes or social bots that infiltrate on a large scale? •Systematic evaluation •Robust detection technique What is the economic rationale behind infiltration OSNs at scale? 17 Victims are highly affected 50 Extended (%) Before After Birth Date Email Address Gender HomeCity Current City PhoneNumber School Name Postal Address IM Account ID Married To Worked At 3.5 2.4 69.1 26.5 25.4 0.9 10.8 0.9 0.6 2.9 2.8 72.4 71.8 69.2 46.2 42.9 21.1 19.7 19.0 10.9 6.4 4.0 4.5 2.6 84.2 29.2 27.8 1.0 12.0 0.7 0.5 3.9 2.8 53.8 4.1 84.2 45.2 41.6 1.5 20.4 1.3 0.8 4.9 3.2 Average 13.3 34.9 15.4 23.7 Number'of'users'(thousands)' ProfileInfo Direct (%) Before After Before' AE er' 40 30 20 10 0 IM account ID Postal address Phone number E-mail address (a) Table2.3: Percentageof use rswith accessibleprivatedata 2.62 times more private data Figur e 2.7: Users with accessible private data collected after infiltration ltration Performance s infiltrated Facebook over 55 daysstarting January 28, 2011. Dur- Collected Data thebotsestablished 3,439friendshipswithvictimusers, whereeach (b) 18 Friends of victims are affected too 50 Extended (%) Before After Birth Date Email Address Gender HomeCity Current City PhoneNumber School Name Postal Address IM Account ID Married To Worked At 3.5 2.4 69.1 26.5 25.4 0.9 10.8 0.9 0.6 2.9 2.8 72.4 71.8 69.2 46.2 42.9 21.1 19.7 19.0 10.9 6.4 4.0 4.5 2.6 84.2 29.2 27.8 1.0 12.0 0.7 0.5 3.9 2.8 53.8 4.1 84.2 45.2 41.6 1.5 20.4 1.3 0.8 4.9 3.2 Average 13.3 34.9 15.4 23.7 Number'of'users'(thousands)' ProfileInfo Direct (%) Before After Before' AE er' 40 30 20 10 0 IM account ID Postal address Phone number E-mail address (a) Table2.3: Percentageof use rswith accessibleprivatedata 1.54 times more, with more than Figur e 2.7: Users with accessible private data 1 million affected users ltration Performance s infiltrated Facebook over 55 daysstarting January 28, 2011. Dur- Collected Data thebotsestablished 3,439friendshipswithvictimusers, whereeach (b) 19 Friends of victims are affected too 50 Extended (%) Before After Birth Date Email Address Gender HomeCity Current City PhoneNumber School Name Postal Address IM Account ID Married To Worked At 3.5 2.4 69.1 26.5 25.4 0.9 10.8 0.9 0.6 2.9 2.8 72.4 71.8 69.2 46.2 42.9 21.1 19.7 19.0 10.9 6.4 4.0 4.5 2.6 84.2 29.2 27.8 1.0 12.0 0.7 0.5 3.9 2.8 53.8 4.1 84.2 45.2 41.6 1.5 20.4 1.3 0.8 4.9 3.2 Average 13.3 34.9 15.4 23.7 Number'of'users'(thousands)' ProfileInfo Direct (%) Before After Before' AE er' 40 From 49K birthdates to 584K 30 20 10 0 IM account ID Postal address Phone number E-mail address (a) Table2.3: Percentageof use rswith accessibleprivatedata 1.54 times more, with more than Figur e 2.7: Users with accessible private data 1 million affected users ltration Performance s infiltrated Facebook over 55 daysstarting January 28, 2011. Dur- (b) 20 Collected Data Acquisti et al. Predicting social security numbers from public data. Proc. Of Nat. Acad. of Sc. 106(27), 2009 thebotsestablished 3,439friendshipswithvictimusers, whereeach Vulnerabilities exploited to automate infiltration (User behavior characterization) Some users are more Fake accounts and profiles Ineffective abuse mitigation to social susceptible infiltration, which partly depends on factors related to their social structure Large scale network crawls Exploitable platforms and APIs 21 User susceptibility to become a victim correlates with social structure Without mutual friends 80 90 Pearson’s r = 0.85 70 80 60 70 50 Acceptance'rate'(%)' Acceptance'rate'(%)' Pearson’s r = 0.85 60% 40 30 20 80% 60 50 40 30 10 20 0 10 20% 0 Number'of'friends' More friends, more susceptible to infiltration 1 2 3 4 5 6 7 8 9 10 Number'of'mutual'friends' More mutual friends, more susceptible to infiltration 22 ≥11 Fake accounts mimic real accounts Only 20% of fakes were “detected” All manually flagged by concerned users 23 Friends of victims are affected too (Feature-based detection is From 49K birthdates to 584K ineffective) 50 Extended (%) Before After Birth Date Email Address Gender HomeCity Current City PhoneNumber School Name Postal Address IM Account ID Married To Worked At 3.5 2.4 69.1 26.5 25.4 0.9 10.8 0.9 0.6 2.9 2.8 72.4 71.8 69.2 46.2 42.9 21.1 19.7 19.0 10.9 6.4 4.0 4.5 2.6 84.2 29.2 27.8 1.0 12.0 0.7 0.5 3.9 2.8 53.8 4.1 84.2 45.2 41.6 1.5 20.4 1.3 0.8 4.9 3.2 Average 13.3 34.9 15.4 23.7 Number'of'users'(thousands)' ProfileInfo Direct (%) Before After Before' AE er' 40 30 20 Socialbots leads to arms race and render feature-based fake account detection ineffective 10 0 IM account ID Postal address Phone number E-mail address (a) Table2.3: Percentageof use rswith accessibleprivatedata 1.54 times more, with more than Figur e 2.7: Users with accessible private data 1 million affected users ltration Performance s infiltrated Facebook over 55 daysstarting January 28, 2011. Dur- (b) 24 Collected Data Acquisti et al. Predicting social security numbers from public data. Proc. Of Nat. Acad. of Sc. 106(27), 2009 thebotsestablished 3,439friendshipswithvictimusers, whereeach Defense side: Infiltration-resilient fake account detection Countermeasure Design 2 1 •Vulnerability analysis of OSN platforms •Characterization of user behavior 1 How vulnerable are OSNs to social infiltration? 1 2 3 4 1 What are the security and privacy implications of social infiltration? •Quantification of privacy breaches •Effectiveness of security defenses •Scalability from economic context •Profit-maximizing infiltration strategy 3 1 How can OSNs detect fakes or social bots that infiltrate on a large scale? •Victim prediction for robust detection •Framework for evaluation What is the economic rationale behind infiltrating OSNs at scale? Graph-based Sybil detection in social and information systems. In Proc. of ASONAM, Aug 2013 Integro: Leveraging victim prediction for robust fake account detection in OSNs. NDSS, Feb 2015 Thwarting fake accounts by predicting their victims. Submitted to TISSEC, Feb 2015 25 Feature-based detection is ineffective Only 20% of fakes were “detected” (Graph-based detection) Social infiltration invalidates the assumption behind graphbased fake account detection All manually flagged by concerned users 26 Graph-based detection Assumes social infiltration on a large scale is infeasible Attack edges Real region Fake region Finds a (provably) sparse cut between the regions by ranking Alvisi et al. The evolution of Sybil defense via social networks. IEEE Security and Privacy, 2013. 27 Graph-based detection Ranks computed from landing probability of a short random walk Cut size = 3 Real region Fake region Most real accounts rank higher than fakes Alvisi et al. The evolution of Sybil defense via social networks. IEEE Security and Privacy, 2013. 28 Graph-based detection is not resilient to social infiltration Cut size = 10 (densest) Real region Fake region 50% of bots had more than 35 attack edges 29 Premise: Regions can be tightly connected Cut size = 10 (densest) Real region Fake region 30 Key idea: Identify potential victims with some probability Potential victim with probability 0.9 Real region Fake region 31 Key idea: Leverage victim prediction to reduce cut size Cut size = 1.9 << 10 High = 1 Medium < 1 Low = 0.1 Real region Fake region Assign lower weight to edges incident to potential victims 32 Delimit the real region by ranking accounts Ranks computed from landing probability of a short random walk High = 1 Medium < 1 Low = 0.1 Real region Fake region Most real accounts are ranked higher than fake accounts 33 Delimit the real region by ranking accounts Ranks computed from landing probability of a short random walk Result 1: Bound on ranking quality Number of fake accounts that rank equal to or higher than real accounts is O(vol(EA) logn) where vol(EA) ≤ |EA| High = 1 Medium < 1 Low = 0.1 Real region Fake region Most real accounts are ranked higher than fake accounts Assuming a fast mixing real region and an attacker who establishes attack edges at random 34 Result 2: Victim classification is feasible (even using low-cost features) 1 AUC = 0.76 AUC = 0.7 True(posiSve(rate( 0.8 AUC = 0.5 0.6 0.4 TuenS( Facebook( 0.2 Random( 40K vectors 0 0 0.2 0.4 0.6 0.8 1 False(posiSve(rate( Random Forests (RF) achieves up to 52% better than random No need to train on more than 40K feature vectors on Tuenti Integro: Leveraging victim prediction for robust fake account detection in OSNs. NDSS, Feb 2015 Thwarting fake accounts by predicting their victims. Submitted to TISSEC, Feb 2015. 35 Result 3: Ranking is resilient to infiltration Integro delivers up to 30% higher AUC, and AUC is always > 0.92 Mean(area(under(ROC(curve( 1.0 0.9 Infiltration resilience 0.8 0.7 0.6 IntegroYBest( IntegroYRF( IntegroYRandom( SybilRank( 0.5 Number(of(a9 ack(edges( Targeted-victim attack Random-victim attack Cao et al. Aiding the Detection of Fake Accounts in Large Scale Social Online Services, NSDI’12 36 Deployment at Tuenti confirms results Integro delivers up to an order or magnitude better precision Low ranks to higher ranks Precision at lower intervals Highly-infiltrating fakes Precision at higher intervals 37 Research Questions and Contributions Countermeasure Design Threat Characterization 2 1 •Vulnerability analysis of OSN platforms •Characterization of user behavior 1 How vulnerable are OSNs to social infiltration? 4 1 What are the security and privacy implications of social infiltration? •Quantification of privacy breaches •Effectiveness of security defenses •Scalability from economic context •Profit-maximizing infiltration strategy 3 1 How can OSNs detect fakes or social bots that infiltrate on a large scale? •Victim prediction for robust detection •Framework for evaluation What is the economic rationale behind infiltrating OSNs at scale? 38 Impact Research Questions and Contributions Threatstudies Public education & further Characterization Countermeasure Production-class deployment Design 4 1 2 1 •Vulnerability analysis of OSN platforms •Characterization of user behavior 1 How vulnerable are OSNs to social infiltration? What are the security and privacy implications of social infiltration? •Quantification of privacy breaches •Effectiveness of security defenses •Scalability from economic context •Profit-maximizing infiltration strategy How can OSNs detect fakes or social bots that infiltrate on a large scale? •Victim prediction for robust detection 42# •Framework for evaluation Open-source, public release 3 1 What is the economic rationale behind infiltrating OSNs at scale? 39 4 impact ResearchResearch Questions and Contributions Publications Primary: Countermeasure Threatstudies Public education & further Production-class deployment 1. Boshmaf et al. The socialbot network: When bots socialize for fame and money. Design Characterization Proc. of ACSAC, Dec 2011 (20% acceptance rate, best paper award) 1. Boshmaf et al. Key challenges in defending against malicious socialbots. In Proc. of USENIX LEET, April 2012 (18% acceptance rate) 4 1 2 1 1. Boshmaf et al. Design and analysis a social What are theof security and botnet. How can OSNs detect J. fakes or social bots that Comp. Net., 57(2), Feb 2013privacy (1.9implications impact of factor) •Vulnerability analysis social infiltration? infiltrate on a large scale? •Scalability from OSN platforms economic context 1.ofBoshmaf et al. Graph-based Sybil detection in social and information systems. •Quantification of •Victim prediction for •Characterization of In Proc. of ASONAM, Aug 2013 (13% acceptance•Profit-maximizing rate, best paper award) privacy breaches robust detection user behavior infiltration strategy 42# •Effectiveness of •Framework for 4 security defenses evaluation Related: 1 3 1 Open-source, public release 1. How vulnerable are OSNs to social infiltration? Boshmaf et al. The socialbot network: are social ACM Interactions, March-April, 2012 What is the economic rationale behind botnets possible? infiltrating OSNs at scale? 1. Sun et al. A billion keys, but few locks: The crisis of web single sign-on. In Proc. of NSPW, Sept 2010 1. Rashtian et al. To befriend or not? A model for friend request acceptance on Facebook. In Proc. of SOUPS, July 2014 40