slides

Sample Complexity Bounds on
Differentially Private Learning via
Communication Complexity
Vitaly Feldman
IBM Research – Almaden
David Xiao
CNRS, Universite Paris 7
ITA, 2015
Learning model
Learner has 𝑛 i.i.d. examples:
𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 over 𝑋 × {0,1}
PAC model [V 84]:
each 𝑥𝑖 ∼ 𝐷 and 𝑦𝑖 = 𝑓(𝑥𝑖 ) for unknown 𝐷 and 𝑓 ∈ 𝐶
For every 𝑓 ∈ 𝐶, 𝐷, 𝜖 > 0, given examples, with
prob. ≥ 3/4 output ℎ: 𝑒𝑟𝑟𝑓,𝐷 ℎ = Pr 𝑓 𝑥 ≠
𝑥∼𝐷
Privacy
Each example is created from
personal data of an individual
(𝑥𝑖 , 𝑦𝑖 ) = (GTTCACG…TC, “YES”)
Differential Privacy [DMNS 06]
(Randomized) algorithm 𝐴 is 𝛼-differentially private if
for any two data sets 𝑆, 𝑆’ such that Δ 𝑆, 𝑆 ′ = 1:
∀𝑇 ⊆ range 𝐴 ,
Pr 𝐴 𝑆 ∈ 𝑇 ≤ 𝑒 𝛼 ⋅ Pr 𝐴 𝑆′ ∈ 𝑇
𝐴
𝐴
What is the cost of privacy?
SCDP(𝐶) = sample complexity of PAC learning 𝐶 with 𝜖 =
1/4 and 1-differential privacy
Thr𝑏
VCDIM(𝐶)
Points
SCDP(𝐶)
Points:
𝐼𝑁𝐷𝑎 𝑎 ∈ 𝑋}; 𝐼𝑁𝐷𝑎 𝑥 = 1 iff 𝑥 = 𝑎
SCDP Points = 𝑂(1) [F 09, BKN 10]
Thr𝑏 :
𝑋 = 1, … , 2𝑏 ; 𝐺𝑇𝑎 𝑥 = 1 iff 𝑥 ≥ 𝑎 ; Thr𝑏 = 𝐺𝑇𝑎 𝑎 ∈ 𝑋};
VCDIM Thr𝑏 = 1; log Thr𝑏 = 𝑏
log|𝐶|
[KLNRS 08]
Our results: lower bounds
Thr𝑏
VCDIM(𝐶)
Line𝑝
LDIM(𝐶)
Point 𝑏
SCDP(𝐶)
log|𝐶|
[KLNRS 08]
LDIM(𝐶): Littlestone’s dimension. Number of mistakes in online learning
Corollaries:
SCDP Thr𝑏 = 𝜃(𝑏) [L 87]
For HS𝑏𝑑 = linear separators over 1, … , 2𝑏
SCDP HS𝑏𝑑 = Ω(𝑑 2 𝑏) [MT 94]
𝑑
Line𝑝 : 𝑋 = 𝒁2𝑝 , Line𝑝 = 𝑓 ∃𝑎, 𝑏 ∈ 𝒁𝑝 such that 𝑓 𝑥, 𝑦 = 1 iff 𝑦 ≡ 𝑎𝑥 + 𝑏 mod 𝑝}
LDIM Line𝑝 = 2; SCDP Line𝑝 = Θ(log 𝑝)
Our results: characterization
𝑓∈𝐶
𝑥∈𝑋
𝜎
Alicia
Roberto
𝑧
∀𝑥 ∈ 𝑋, 𝑓 ∈ 𝐶, Pr 𝑧 ≠ 𝑓 𝑥
Eval-𝐶: 𝐶 × 𝑋 → 0,1 with Eval−𝐶 𝑓, 𝑥 = 𝑓(𝑥)
Private coins: 𝐶𝐶 → Eval−𝐶
Public coins: 𝐶𝐶 →,pub Eval−𝐶
SCDP 𝐶 = Θ 𝐶𝐶 →,pub Eval−𝐶
≤ 1/4
Related results
Distributional assumptions/Label privacy only/Count only labeled
• Θ VCDIM 𝐶
[CH 11, BNS 15]
Characterization in terms of distribution independent covers:
• SCDP 𝐶 = Θ RCOVER 𝐶
[BNS 13a]
Distribution-independent covers
𝐻 𝜖-covers 𝑓 over distr. 𝐷 if ∃ℎ ∈ 𝐻 s.t. 𝑒𝑟𝑟𝑓,𝐷 ℎ ≤ 𝜖
𝐻 is a distribution-independent (DI) 𝜖-cover for 𝐶 if
∀𝑓 ∈ 𝐶 and distr. 𝐷, 𝐻 𝜖-covers 𝑓 over distr. 𝐷
COVER 𝐶 = min log 𝐻
𝐻 is DI 1/4-cover for 𝐶}
Thm: SCDP 𝐶 = O(COVER 𝐶 ) [KLNRS 08, BKN 10]
Proof: exponential mechanism [MT 07]
Randomized DI covers
Let 𝚷 be a distribution over sets of hypotheses
𝚷 is a DI (𝜖, 𝛿)-cover for 𝐶 if ∀𝑓 ∈ 𝐶 and 𝐷,
Pr 𝐻 𝜖−covers 𝑓 over 𝐷 ≥ 1 − 𝛿
𝐻∼𝚷
size(𝚷)=
max
𝐻∈supp(𝚷)
|𝐻|
RCOVER 𝐶 = min log size 𝚷
𝚷 is DI
1 1
,
4 4
-cover for 𝐶}
RCOVER 𝐶 = Θ(SCDP 𝐶 ) [BNS 13a]
From covers to CC
∀𝑓 ∈ 𝐶 and distr. 𝐷, ∃ℎ ∈ 𝐻 s.t. 𝑒𝑟𝑟𝑓,𝐷 ℎ ≤ 1/4
von Neumann minimax
∀𝑓 ∈ 𝐶, ∃ distribution 𝐡𝑓 over 𝐻 s.t. ∀𝑥 ∈ 𝑋
Pr ℎ 𝑥 ≠ 𝑓 𝑥 ≤ 1/4
ℎ∼𝐡𝑓
𝑓∈𝐶
ℎ ∼ 𝐡𝑓
𝑥∈𝑋
ℎ
ℎ(𝑥)
CC 𝜋 = log |𝐻|
∀𝑓 ∈ 𝐶, 𝑥 ∈ 𝑋, Pr ℎ 𝑥 ≠ 𝑓 𝑥
≤ 1/4
From covers to CC
COVER 𝐶 = Θ 𝐶𝐶 → Eval−𝐶
RCOVER 𝐶 = Θ 𝐶𝐶 →,pub Eval−𝐶
𝐶𝐶 → Eval−𝐶 ≤ 𝐶𝐶 →,pub Eval−𝐶 + 𝑂(loglog 𝐶 𝑋 )
[N 91]
Lower bound tools
Information theory [BJKS 02]
1. Find hard distribution over inputs to Eval-𝐶
2. Low communication ⇒ low (mutual) information
3. Low information ⇒ large error
𝑥∅
Augmented Index
[BJKK 04, BIPW 10]
𝑥0
0 1 0 0 0 1 0 1 0 1
0 1 0 0 0
𝑥0..0
𝑓0..00
𝑓0..01
𝑥0..1
𝑓0..10
𝑓0..11
𝑥1
𝑥1..0
𝑓1..00 𝑓1..01
LDIM(𝐶)
mistake tree
𝑥1..1
𝑓1..10 𝑓1..11
Our results: upper bounds
Relaxed (𝛼, 𝛽)-differential privacy
𝐴 is (𝛼, 𝛽)-differentially private if for any two data sets 𝑆, 𝑆’
such that Δ 𝑆, 𝑆 ′ = 1:
∀𝑇 ⊆ range 𝐴 , Pr 𝐴 𝑆 ∈ 𝑇 ≤ 𝑒 𝛼 ⋅ Pr 𝐴 𝑆′ ∈ 𝑇 + 𝛽
𝐴
𝐴
∗𝑏
SCDP𝛽 Thr𝑏 = 𝑂(16log
log (1/𝛽)) [BNS 13b]
SCDP𝛽 Line𝑝 = 𝑂(log (1/𝛽))
An efficient 𝛼, 𝛽 -DP algo that learns Line𝑝 using
𝑂(log (1/𝛽)/(𝛼𝜖)) examples
Conclusions and open problems
1. Characterization in terms of communication
1.
2.
Tools from information theory
Additional applications
¿ Is sample complexity of 𝛼, 𝛽 -diff. private learning different from
VCDIM?
¿ What is the sample complexity of efficient DP learning of HS𝑏𝑑 ?

Open as PDF

Similar pages: Define PWM duty cycle to stabilize light emission; .ppt; .ppt; ETC C-1801H; slides