Accelerating Graph Processing on Hybrid Systems

Security Analysis of Malicious
LivingSocialbots
in the (malicious)
on thesocial
Web web:
Beyond friendships
Yazan Boshmaf
Yazan Boshmaf, Konstantin
Beznosov, Matei Ripeanu,
Dionysions Logothetis, Georgios Siganos, Jose Lorenzo
Dissertation presented in partial fulfillment of degree requirements of PhD in ECE, UBC
1
Social bots
Automated fake accounts in online social networks (OSNs)
+
=
Designed to deceive and appear human
Hwang et al. Socialbots: Voices from the fronts. ACM Interactions 19, 2 (March 2012), 38-45.
2
The threat of malicious social bots
Automated fake accounts in online social networks (OSNs)
What is at stake?
+
=
Designed to deceive and appear human
Hwang et al. Socialbots: Voices from the fronts. ACM Interactions 19, 2 (March 2012), 38-45.
3
Fake accounts are bad for business
“… If advertisers, developers, or investors do not perceive
our user metrics to be accurate representations of our user
base, or if we discover material inaccuracies in our user
metrics, our reputation may be harmed and advertisers
and developers may be less willing to allocate their
budgets or resources to Facebook, which could negatively
affect our business and financial results…”
4
Fake accounts are bad for users
OSNs are attractive medium for abusive users
Social Infiltration
Connecting with many benign users (friend request spam)
Bilge et al. All your contacts are belong to us: Automated identity theft attacks on social networks. Proc. of WWW, 2009
5
Fake accounts are bad for users
OSNs are attractive medium for abusive users
Social Infiltration
Data collection
Online surveillance, profiling, and data commoditization
Nolan et al. Hacking human: Data-archaeology and surveillance in social networks. ACM SIGGROUP Bulletin 25.2, 2005
6
Fake accounts are bad for users
OSNs are attractive medium for abusive users
Social Infiltration
Data collection
Misinformation
Influencing users, biasing public opinion, propaganda
Ratkiewicz et al. Detecting and tracking political abuse in social media. Proc. of ICWSM. 2011
7
Fake accounts are bad for users
OSNs are attractive medium for abusive users
Social Infiltration
Data collection
Misinformation
Malware Infection
Infecting computers and use it for DDoS, spamming, and fraud
Thomas et al. The Koobface botnet and the rise of social malware. Proc. of MALWARE, 2010
8
Fake accounts
are
bad
for
users
Our work
OSNs are attractive medium for abusive content
Threat characterization
Social Infiltration
Data collection
Countermeasure design
Misinformation
Malware Infection
Infecting computers and use it for DDoS, spamming, and
fraud1
1
Thomas et al. The Koobface botnet and the rise of social malware. Proc. of MALWARE, 2010.
9
Questions
2
1
• Vulnerability
analysis
• Characterization
of user behavior
1
How vulnerable are OSNs
to social infiltration?
4
1
What are the security and
privacy implications of
social infiltration?
•Quantification of
privacy breaches
•Effectiveness of
security defenses
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
3
1
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Victim prediction for
robust detection
•Framework for
evaluation
What is the economic
rationale behind
infiltrating OSNs at scale?
10
Questions
2
1
• Vulnerability
analysis
• Characterization
of user behavior
1
How vulnerable are OSNs
to social infiltration?
What are the security and
privacy implications of
social infiltration?
• Quantifying
privacy breaches
• Effectiveness of
security defenses
4
1
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
3
1
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Victim prediction for
robust detection
•Framework for
evaluation
What is the economic
rationale behind
infiltrating OSNs at scale?
11
Questions
2
1
• Vulnerability
analysis
• Characterization
of user behavior
1
How vulnerable are OSNs
to social infiltration?
4
1
What are the security and
privacy implications of
social infiltration?
• Quantifying
privacy breaches
• Effectiveness of
security defenses
• Scalability in
economic context
• Profit-maximizing
infiltration
strategy
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Victim prediction for
robust detection
•Framework for
evaluation
3
1
What is the economic
rationale behind
infiltrating OSNs at scale?
12
Questions
Countermeasure
Design
Threat
Characterization
2
1
•Vulnerability analysis
of OSN platforms
•Characterization of
user behavior
1
How vulnerable are OSNs
to social infiltration?
4
1
What are the security and
privacy implications of
social infiltration?
•Quantification of
privacy breaches
•Effectiveness of
security defenses
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
3
1
What is the economic
rationale behind
infiltrating OSNs at scale?
How to detect social bots
that infiltrate on a large
scale?
•Is victim prediction
feasible
•Can victim prediction
enable robust
detection
13
Attack side: Social infiltration in OSNs
Threat
Characterization
2
1
•Vulnerability analysis
of OSN platforms
•Characterization of
user behavior
1
How vulnerable are OSNs
to social infiltration?
1
2
3
4
1
What are the security and
privacy implications of
social infiltration?
•Quantification of
privacy breaches
•Effectiveness of
security defenses
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
3
1
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Victim prediction for
robust detection
•Framework for
evaluation
What is the economic
rationale behind
infiltrating OSNs at scale?
The socialbot network: When bots socialize for fame and money, Boshmaf, Beznosov, Ripeanu, ACSAC, Dec 2011
Key challenges in defending against malicious socialbots, Boshmaf, Beznosov, Ripeanu, USENIX LEET, April 2012
Design and analysis of a social botnet, Boshmaf, Beznosov, Ripeanu, J. Comp. Net., 57(2), Feb 2013
14
Social botnet: Experiment
Operated 100 socialbots on Facebook, single botmaster
Bots sent 9.6K friend requests send in 8 weeks,
35.7% requests from bots accepted (victims)
15
Main findings
(Platform-level vulnerability)
2
1
It is feasible to automate social
infiltration by exploiting
platform and user
vulnerabilities
•Vulnerability analysis
of OSN platforms
•Characterization of
user behavior
1
4
1
What are the security and
privacy implications of
social infiltration?
•Effectiveness of
security defenses
•Quantification of
privacy breaches
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
3
1
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Systematic
evaluation
•Robust detection
technique
What is the economic
rational behind infiltration
OSNs at scale?
How vulnerable are OSNs
to social infiltration?
Threat
Characterization
16
Main findings
(Data breaches)
Threat
Characterization
2
1
Social infiltration results in
serious privacy breaches,
where personally identifiable
information is compromised
•Vulnerability analysis
of OSN platforms
•Characterization of
user behavior
1
4
1
How vulnerable are OSNs
to social infiltration?
What are the security and
privacy implications of
social infiltration?
•Effectiveness of
security defenses
•Quantification of
privacy breaches
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
3
1
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Systematic
evaluation
•Robust detection
technique
What is the economic
rationale behind
infiltration OSNs at scale?
17
Victims are highly affected
50
Extended (%)
Before After
Birth Date
Email Address
Gender
HomeCity
Current City
PhoneNumber
School Name
Postal Address
IM Account ID
Married To
Worked At
3.5
2.4
69.1
26.5
25.4
0.9
10.8
0.9
0.6
2.9
2.8
72.4
71.8
69.2
46.2
42.9
21.1
19.7
19.0
10.9
6.4
4.0
4.5
2.6
84.2
29.2
27.8
1.0
12.0
0.7
0.5
3.9
2.8
53.8
4.1
84.2
45.2
41.6
1.5
20.4
1.3
0.8
4.9
3.2
Average
13.3
34.9
15.4
23.7
Number'of'users'(thousands)'
ProfileInfo
Direct (%)
Before After
Before'
AE er'
40
30
20
10
0
IM account ID Postal address Phone number E-mail address
(a)
Table2.3: Percentageof use
rswith accessibleprivatedata
2.62 times more private data
Figur
e 2.7: Users with accessible private data
collected
after
infiltration
ltration Performance
s infiltrated Facebook over 55 daysstarting January 28, 2011. Dur-
Collected Data
thebotsestablished 3,439friendshipswithvictimusers, whereeach
(b)
18
Friends of victims are affected too
50
Extended (%)
Before After
Birth Date
Email Address
Gender
HomeCity
Current City
PhoneNumber
School Name
Postal Address
IM Account ID
Married To
Worked At
3.5
2.4
69.1
26.5
25.4
0.9
10.8
0.9
0.6
2.9
2.8
72.4
71.8
69.2
46.2
42.9
21.1
19.7
19.0
10.9
6.4
4.0
4.5
2.6
84.2
29.2
27.8
1.0
12.0
0.7
0.5
3.9
2.8
53.8
4.1
84.2
45.2
41.6
1.5
20.4
1.3
0.8
4.9
3.2
Average
13.3
34.9
15.4
23.7
Number'of'users'(thousands)'
ProfileInfo
Direct (%)
Before After
Before'
AE er'
40
30
20
10
0
IM account ID Postal address Phone number E-mail address
(a)
Table2.3: Percentageof use
rswith accessibleprivatedata
1.54 times more, with more than
Figur e 2.7:
Users with accessible private data
1
million
affected
users
ltration Performance
s infiltrated Facebook over 55 daysstarting January 28, 2011. Dur-
Collected Data
thebotsestablished 3,439friendshipswithvictimusers, whereeach
(b)
19
Friends of victims are affected too
50
Extended (%)
Before After
Birth Date
Email Address
Gender
HomeCity
Current City
PhoneNumber
School Name
Postal Address
IM Account ID
Married To
Worked At
3.5
2.4
69.1
26.5
25.4
0.9
10.8
0.9
0.6
2.9
2.8
72.4
71.8
69.2
46.2
42.9
21.1
19.7
19.0
10.9
6.4
4.0
4.5
2.6
84.2
29.2
27.8
1.0
12.0
0.7
0.5
3.9
2.8
53.8
4.1
84.2
45.2
41.6
1.5
20.4
1.3
0.8
4.9
3.2
Average
13.3
34.9
15.4
23.7
Number'of'users'(thousands)'
ProfileInfo
Direct (%)
Before After
Before'
AE er'
40
From
49K birthdates to 584K
30
20
10
0
IM account ID Postal address Phone number E-mail address
(a)
Table2.3: Percentageof use
rswith accessibleprivatedata
1.54 times more, with more than
Figur e 2.7:
Users with accessible private data
1
million
affected
users
ltration Performance
s infiltrated Facebook over 55 daysstarting January 28, 2011. Dur-
(b)
20
Collected
Data
Acquisti et al. Predicting social security numbers from public data. Proc. Of Nat. Acad. of Sc. 106(27), 2009
thebotsestablished 3,439friendshipswithvictimusers, whereeach
Vulnerabilities exploited to automate infiltration
(User behavior characterization)
Some users are more
Fake accounts
and profiles
Ineffective
abuse mitigation to social
susceptible
infiltration,
which partly depends on factors
related to their social structure
Large scale network crawls
Exploitable platforms and APIs
21
User susceptibility to become a victim
correlates with social structure
Without mutual friends
80
90
Pearson’s r = 0.85
70
80
60
70
50
Acceptance'rate'(%)'
Acceptance'rate'(%)'
Pearson’s r = 0.85
60%
40
30
20
80%
60
50
40
30
10
20
0
10
20%
0
Number'of'friends'
More friends, more
susceptible to infiltration
1
2
3
4
5
6
7
8
9
10
Number'of'mutual'friends'
More mutual friends, more
susceptible to infiltration
22
≥11
Fake accounts mimic real accounts
Only 20% of fakes were “detected”
All manually flagged by concerned users
23
Friends of victims are affected too
(Feature-based detection is
From 49K birthdates to 584K
ineffective)
50
Extended (%)
Before After
Birth Date
Email Address
Gender
HomeCity
Current City
PhoneNumber
School Name
Postal Address
IM Account ID
Married To
Worked At
3.5
2.4
69.1
26.5
25.4
0.9
10.8
0.9
0.6
2.9
2.8
72.4
71.8
69.2
46.2
42.9
21.1
19.7
19.0
10.9
6.4
4.0
4.5
2.6
84.2
29.2
27.8
1.0
12.0
0.7
0.5
3.9
2.8
53.8
4.1
84.2
45.2
41.6
1.5
20.4
1.3
0.8
4.9
3.2
Average
13.3
34.9
15.4
23.7
Number'of'users'(thousands)'
ProfileInfo
Direct (%)
Before After
Before'
AE er'
40
30
20
Socialbots leads to arms race
and render feature-based fake
account detection ineffective
10
0
IM account ID Postal address Phone number E-mail address
(a)
Table2.3: Percentageof use
rswith accessibleprivatedata
1.54 times more, with more than
Figur e 2.7:
Users with accessible private data
1
million
affected
users
ltration Performance
s infiltrated Facebook over 55 daysstarting January 28, 2011. Dur-
(b)
24
Collected
Data
Acquisti et al. Predicting social security numbers from public data. Proc. Of Nat. Acad. of Sc. 106(27), 2009
thebotsestablished 3,439friendshipswithvictimusers, whereeach
Defense side: Infiltration-resilient fake
account detection
Countermeasure
Design
2
1
•Vulnerability analysis
of OSN platforms
•Characterization of
user behavior
1
How vulnerable are OSNs
to social infiltration?
1
2
3
4
1
What are the security and
privacy implications of
social infiltration?
•Quantification of
privacy breaches
•Effectiveness of
security defenses
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
3
1
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Victim prediction for
robust detection
•Framework for
evaluation
What is the economic
rationale behind
infiltrating OSNs at scale?
Graph-based Sybil detection in social and information systems. In Proc. of ASONAM, Aug 2013
Integro: Leveraging victim prediction for robust fake account detection in OSNs. NDSS, Feb 2015
Thwarting fake accounts by predicting their victims. Submitted to TISSEC, Feb 2015
25
Feature-based detection is ineffective
Only 20% of fakes were “detected”
(Graph-based detection)
Social infiltration invalidates
the assumption behind graphbased fake account detection
All manually flagged by concerned users
26
Graph-based detection
Assumes social infiltration on a large scale is infeasible
Attack edges
Real region
Fake region
Finds a (provably) sparse cut between the regions by ranking
Alvisi et al. The evolution of Sybil defense via social networks. IEEE Security and Privacy, 2013.
27
Graph-based detection
Ranks computed from landing probability of a short random walk
Cut size = 3
Real region
Fake region
Most real accounts rank higher than fakes
Alvisi et al. The evolution of Sybil defense via social networks. IEEE Security and Privacy, 2013.
28
Graph-based detection is not resilient to
social infiltration
Cut size = 10 (densest)
Real region
Fake region
50% of bots had more than 35 attack edges
29
Premise: Regions can be tightly connected
Cut size = 10 (densest)
Real region
Fake region
30
Key idea: Identify potential victims with some
probability
Potential victim with
probability 0.9
Real region
Fake region
31
Key idea: Leverage victim prediction to reduce
cut size
Cut size = 1.9 << 10
High = 1
Medium < 1
Low = 0.1
Real region
Fake region
Assign lower weight to edges incident to potential victims
32
Delimit the real region by ranking accounts
Ranks computed from landing probability of a short random walk
High = 1
Medium < 1
Low = 0.1
Real region
Fake region
Most real accounts are ranked higher than fake accounts
33
Delimit the real region by ranking accounts
Ranks computed from landing probability of a short random walk
Result 1: Bound on ranking quality
Number of fake accounts that rank
equal to or higher than real accounts
is O(vol(EA) logn) where vol(EA) ≤ |EA|
High = 1
Medium < 1
Low = 0.1
Real region
Fake region
Most real accounts are ranked higher than fake accounts
Assuming a fast mixing real region and an attacker who establishes attack edges at random
34
Result 2: Victim classification is feasible
(even using low-cost features)
1
AUC = 0.76
AUC = 0.7
True(posiSve(rate(
0.8
AUC = 0.5
0.6
0.4
TuenS(
Facebook(
0.2
Random(
40K vectors
0
0
0.2
0.4
0.6
0.8
1
False(posiSve(rate(
Random Forests (RF) achieves up
to 52% better than random
No need to train on more than
40K feature vectors on Tuenti
Integro: Leveraging victim prediction for robust fake account detection in OSNs. NDSS, Feb 2015
Thwarting fake accounts by predicting their victims. Submitted to TISSEC, Feb 2015.
35
Result 3: Ranking is resilient to
infiltration
Integro delivers up to 30% higher AUC, and AUC is always > 0.92
Mean(area(under(ROC(curve(
1.0
0.9
Infiltration
resilience
0.8
0.7
0.6
IntegroYBest(
IntegroYRF(
IntegroYRandom(
SybilRank(
0.5
Number(of(a9 ack(edges(
Targeted-victim attack
Random-victim attack
Cao et al. Aiding the Detection of Fake Accounts in Large Scale Social Online Services, NSDI’12
36
Deployment at Tuenti confirms results
Integro delivers up to an order or magnitude better precision
Low ranks to higher ranks
Precision at lower intervals
Highly-infiltrating fakes
Precision at higher intervals
37
Research Questions and Contributions
Countermeasure
Design
Threat
Characterization
2
1
•Vulnerability analysis
of OSN platforms
•Characterization of
user behavior
1
How vulnerable are OSNs
to social infiltration?
4
1
What are the security and
privacy implications of
social infiltration?
•Quantification of
privacy breaches
•Effectiveness of
security defenses
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
3
1
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Victim prediction for
robust detection
•Framework for
evaluation
What is the economic
rationale behind
infiltrating OSNs at scale?
38
Impact
Research Questions
and Contributions
Threatstudies
Public education & further
Characterization
Countermeasure
Production-class
deployment
Design
4
1
2
1
•Vulnerability analysis
of OSN platforms
•Characterization of
user behavior
1
How vulnerable are OSNs
to social infiltration?
What are the security and
privacy implications of
social infiltration?
•Quantification of
privacy breaches
•Effectiveness of
security defenses
•Scalability from
economic context
•Profit-maximizing
infiltration strategy
How can OSNs detect
fakes or social bots that
infiltrate on a large scale?
•Victim prediction for
robust detection
42#
•Framework for
evaluation
Open-source, public release
3
1
What is the economic
rationale behind
infiltrating OSNs at scale?
39
4
impact
ResearchResearch
Questions
and
Contributions
Publications
Primary:
Countermeasure
Threatstudies
Public
education
& further
Production-class
deployment
1. Boshmaf
et al. The socialbot
network:
When bots socialize
for fame and money.
Design
Characterization
Proc. of ACSAC, Dec 2011
(20% acceptance rate, best paper award)
1. Boshmaf et al. Key challenges in defending against malicious socialbots.
In Proc. of USENIX LEET, April 2012 (18% acceptance rate)
4
1
2
1
1. Boshmaf et al. Design and analysis
a social
What are theof
security
and botnet.
How can OSNs detect J.
fakes or social bots that
Comp. Net., 57(2), Feb 2013privacy
(1.9implications
impact of
factor)
•Vulnerability analysis
social infiltration?
infiltrate on a large scale?
•Scalability from
OSN platforms
economic
context
1.ofBoshmaf
et al. Graph-based Sybil detection in social
and information
systems.
•Quantification
of
•Victim prediction for
•Characterization
of
In Proc. of ASONAM,
Aug 2013 (13% acceptance•Profit-maximizing
rate, best paper award)
privacy
breaches
robust detection
user behavior
infiltration strategy
42#
•Effectiveness of
•Framework for
4
security defenses
evaluation
Related:
1
3
1
Open-source, public release
1.
How vulnerable are OSNs
to social infiltration?
Boshmaf
et al. The
socialbot network: are social
ACM Interactions, March-April, 2012
What is the economic
rationale behind
botnets
possible?
infiltrating
OSNs at scale?
1. Sun et al. A billion keys, but few locks: The crisis of web single sign-on.
In Proc. of NSPW, Sept 2010
1. Rashtian et al. To befriend or not? A model for friend request acceptance on Facebook.
In Proc. of SOUPS, July 2014
40