slides

Sample Complexity Bounds on
Differentially Private Learning via
Communication Complexity
Vitaly Feldman
IBM Research – Almaden
David Xiao
CNRS, Universite Paris 7
ITA, 2015
Learning model
Learner has 𝑛 i.i.d. examples:
π‘₯1 , 𝑦1 , … , π‘₯𝑛 , 𝑦𝑛 over 𝑋 × {0,1}
PAC model [V 84]:
each π‘₯𝑖 ∼ 𝐷 and 𝑦𝑖 = 𝑓(π‘₯𝑖 ) for unknown 𝐷 and 𝑓 ∈ 𝐢
For every 𝑓 ∈ 𝐢, 𝐷, πœ– > 0, given examples, with
prob. β‰₯ 3/4 output β„Ž: π‘’π‘Ÿπ‘Ÿπ‘“,𝐷 β„Ž = Pr 𝑓 π‘₯ β‰ 
π‘₯∼𝐷
Privacy
Each example is created from
personal data of an individual
(π‘₯𝑖 , 𝑦𝑖 ) = (GTTCACG…TC, β€œYES”)
Differential Privacy [DMNS 06]
(Randomized) algorithm 𝐴 is 𝛼-differentially private if
for any two data sets 𝑆, 𝑆’ such that Ξ” 𝑆, 𝑆 β€² = 1:
βˆ€π‘‡ βŠ† range 𝐴 ,
Pr 𝐴 𝑆 ∈ 𝑇 ≀ 𝑒 𝛼 β‹… Pr 𝐴 𝑆′ ∈ 𝑇
𝐴
𝐴
What is the cost of privacy?
SCDP(𝐢) = sample complexity of PAC learning 𝐢 with πœ– =
1/4 and 1-differential privacy
Thr𝑏
VCDIM(𝐢)
Points
SCDP(𝐢)
Points:
πΌπ‘π·π‘Ž π‘Ž ∈ 𝑋}; πΌπ‘π·π‘Ž π‘₯ = 1 iff π‘₯ = π‘Ž
SCDP Points = 𝑂(1) [F 09, BKN 10]
Thr𝑏 :
𝑋 = 1, … , 2𝑏 ; πΊπ‘‡π‘Ž π‘₯ = 1 iff π‘₯ β‰₯ π‘Ž ; Thr𝑏 = πΊπ‘‡π‘Ž π‘Ž ∈ 𝑋};
VCDIM Thr𝑏 = 1; log Thr𝑏 = 𝑏
log|𝐢|
[KLNRS 08]
Our results: lower bounds
Thr𝑏
VCDIM(𝐢)
Line𝑝
LDIM(𝐢)
Point 𝑏
SCDP(𝐢)
log|𝐢|
[KLNRS 08]
LDIM(𝐢): Littlestone’s dimension. Number of mistakes in online learning
Corollaries:
SCDP Thr𝑏 = πœƒ(𝑏) [L 87]
For HS𝑏𝑑 = linear separators over 1, … , 2𝑏
SCDP HS𝑏𝑑 = Ξ©(𝑑 2 𝑏) [MT 94]
𝑑
Line𝑝 : 𝑋 = 𝒁2𝑝 , Line𝑝 = 𝑓 βˆƒπ‘Ž, 𝑏 ∈ 𝒁𝑝 such that 𝑓 π‘₯, 𝑦 = 1 iff 𝑦 ≑ π‘Žπ‘₯ + 𝑏 mod 𝑝}
LDIM Line𝑝 = 2; SCDP Line𝑝 = Θ(log 𝑝)
Our results: characterization
π‘“βˆˆπΆ
π‘₯βˆˆπ‘‹
𝜎
Alicia
Roberto
𝑧
βˆ€π‘₯ ∈ 𝑋, 𝑓 ∈ 𝐢, Pr 𝑧 β‰  𝑓 π‘₯
Eval-𝐢: 𝐢 × π‘‹ β†’ 0,1 with Evalβˆ’πΆ 𝑓, π‘₯ = 𝑓(π‘₯)
Private coins: 𝐢𝐢 β†’ Evalβˆ’πΆ
Public coins: 𝐢𝐢 β†’,pub Evalβˆ’πΆ
SCDP 𝐢 = Θ 𝐢𝐢 β†’,pub Evalβˆ’πΆ
≀ 1/4
Related results
Distributional assumptions/Label privacy only/Count only labeled
β€’ Θ VCDIM 𝐢
[CH 11, BNS 15]
Characterization in terms of distribution independent covers:
β€’ SCDP 𝐢 = Θ RCOVER 𝐢
[BNS 13a]
Distribution-independent covers
𝐻 πœ–-covers 𝑓 over distr. 𝐷 if βˆƒβ„Ž ∈ 𝐻 s.t. π‘’π‘Ÿπ‘Ÿπ‘“,𝐷 β„Ž ≀ πœ–
𝐻 is a distribution-independent (DI) πœ–-cover for 𝐢 if
βˆ€π‘“ ∈ 𝐢 and distr. 𝐷, 𝐻 πœ–-covers 𝑓 over distr. 𝐷
COVER 𝐢 = min log 𝐻
𝐻 is DI 1/4-cover for 𝐢}
Thm: SCDP 𝐢 = O(COVER 𝐢 ) [KLNRS 08, BKN 10]
Proof: exponential mechanism [MT 07]
Randomized DI covers
Let 𝚷 be a distribution over sets of hypotheses
𝚷 is a DI (πœ–, 𝛿)-cover for 𝐢 if βˆ€π‘“ ∈ 𝐢 and 𝐷,
Pr 𝐻 πœ–βˆ’covers 𝑓 over 𝐷 β‰₯ 1 βˆ’ 𝛿
𝐻∼𝚷
size(𝚷)=
max
𝐻∈supp(𝚷)
|𝐻|
RCOVER 𝐢 = min log size 𝚷
𝚷 is DI
1 1
,
4 4
-cover for 𝐢}
RCOVER 𝐢 = Θ(SCDP 𝐢 ) [BNS 13a]
From covers to CC
βˆ€π‘“ ∈ 𝐢 and distr. 𝐷, βˆƒβ„Ž ∈ 𝐻 s.t. π‘’π‘Ÿπ‘Ÿπ‘“,𝐷 β„Ž ≀ 1/4
von Neumann minimax
βˆ€π‘“ ∈ 𝐢, βˆƒ distribution 𝐑𝑓 over 𝐻 s.t. βˆ€π‘₯ ∈ 𝑋
Pr β„Ž π‘₯ β‰  𝑓 π‘₯ ≀ 1/4
β„ŽβˆΌπ‘π‘“
π‘“βˆˆπΆ
β„Ž ∼ 𝐑𝑓
π‘₯βˆˆπ‘‹
β„Ž
β„Ž(π‘₯)
CC πœ‹ = log |𝐻|
βˆ€π‘“ ∈ 𝐢, π‘₯ ∈ 𝑋, Pr β„Ž π‘₯ β‰  𝑓 π‘₯
≀ 1/4
From covers to CC
COVER 𝐢 = Θ 𝐢𝐢 β†’ Evalβˆ’πΆ
RCOVER 𝐢 = Θ 𝐢𝐢 β†’,pub Evalβˆ’πΆ
𝐢𝐢 β†’ Evalβˆ’πΆ ≀ 𝐢𝐢 β†’,pub Evalβˆ’πΆ + 𝑂(loglog 𝐢 𝑋 )
[N 91]
Lower bound tools
Information theory [BJKS 02]
1. Find hard distribution over inputs to Eval-𝐢
2. Low communication β‡’ low (mutual) information
3. Low information β‡’ large error
π‘₯βˆ…
Augmented Index
[BJKK 04, BIPW 10]
π‘₯0
0 1 0 0 0 1 0 1 0 1
0 1 0 0 0
π‘₯0..0
𝑓0..00
𝑓0..01
π‘₯0..1
𝑓0..10
𝑓0..11
π‘₯1
π‘₯1..0
𝑓1..00 𝑓1..01
LDIM(𝐢)
mistake tree
π‘₯1..1
𝑓1..10 𝑓1..11
Our results: upper bounds
Relaxed (𝛼, 𝛽)-differential privacy
𝐴 is (𝛼, 𝛽)-differentially private if for any two data sets 𝑆, 𝑆’
such that Ξ” 𝑆, 𝑆 β€² = 1:
βˆ€π‘‡ βŠ† range 𝐴 , Pr 𝐴 𝑆 ∈ 𝑇 ≀ 𝑒 𝛼 β‹… Pr 𝐴 𝑆′ ∈ 𝑇 + 𝛽
𝐴
𝐴
βˆ—π‘
SCDP𝛽 Thr𝑏 = 𝑂(16log
log (1/𝛽)) [BNS 13b]
SCDP𝛽 Line𝑝 = 𝑂(log (1/𝛽))
An efficient 𝛼, 𝛽 -DP algo that learns Line𝑝 using
𝑂(log (1/𝛽)/(π›Όπœ–)) examples
Conclusions and open problems
1. Characterization in terms of communication
1.
2.
Tools from information theory
Additional applications
¿ Is sample complexity of 𝛼, 𝛽 -diff. private learning different from
VCDIM?
¿ What is the sample complexity of efficient DP learning of HS𝑏𝑑 ?