Sample Complexity Bounds on
Differentially Private Learning via
Communication Complexity
Vitaly Feldman
IBM Research β Almaden
David Xiao
CNRS, Universite Paris 7
ITA, 2015
Learning model
Learner has π i.i.d. examples:
π₯1 , π¦1 , β¦ , π₯π , π¦π over π × {0,1}
PAC model [V 84]:
each π₯π βΌ π· and π¦π = π(π₯π ) for unknown π· and π β πΆ
For every π β πΆ, π·, π > 0, given examples, with
prob. β₯ 3/4 output β: ππππ,π· β = Pr π π₯ β
π₯βΌπ·
Privacy
Each example is created from
personal data of an individual
(π₯π , π¦π ) = (GTTCACGβ¦TC, βYESβ)
Differential Privacy [DMNS 06]
(Randomized) algorithm π΄ is πΌ-differentially private if
for any two data sets π, πβ such that Ξ π, π β² = 1:
βπ β range π΄ ,
Pr π΄ π β π β€ π πΌ β
Pr π΄ πβ² β π
π΄
π΄
What is the cost of privacy?
SCDP(πΆ) = sample complexity of PAC learning πΆ with π =
1/4 and 1-differential privacy
Thrπ
VCDIM(πΆ)
Points
SCDP(πΆ)
Points:
πΌππ·π π β π}; πΌππ·π π₯ = 1 iff π₯ = π
SCDP Points = π(1) [F 09, BKN 10]
Thrπ :
π = 1, β¦ , 2π ; πΊππ π₯ = 1 iff π₯ β₯ π ; Thrπ = πΊππ π β π};
VCDIM Thrπ = 1; log Thrπ = π
log|πΆ|
[KLNRS 08]
Our results: lower bounds
Thrπ
VCDIM(πΆ)
Lineπ
LDIM(πΆ)
Point π
SCDP(πΆ)
log|πΆ|
[KLNRS 08]
LDIM(πΆ): Littlestoneβs dimension. Number of mistakes in online learning
Corollaries:
SCDP Thrπ = π(π) [L 87]
For HSππ = linear separators over 1, β¦ , 2π
SCDP HSππ = Ξ©(π 2 π) [MT 94]
π
Lineπ : π = π2π , Lineπ = π βπ, π β ππ such that π π₯, π¦ = 1 iff π¦ β‘ ππ₯ + π mod π}
LDIM Lineπ = 2; SCDP Lineπ = Ξ(log π)
Our results: characterization
πβπΆ
π₯βπ
π
Alicia
Roberto
π§
βπ₯ β π, π β πΆ, Pr π§ β π π₯
Eval-πΆ: πΆ × π β 0,1 with EvalβπΆ π, π₯ = π(π₯)
Private coins: πΆπΆ β EvalβπΆ
Public coins: πΆπΆ β,pub EvalβπΆ
SCDP πΆ = Ξ πΆπΆ β,pub EvalβπΆ
β€ 1/4
Related results
Distributional assumptions/Label privacy only/Count only labeled
β’ Ξ VCDIM πΆ
[CH 11, BNS 15]
Characterization in terms of distribution independent covers:
β’ SCDP πΆ = Ξ RCOVER πΆ
[BNS 13a]
Distribution-independent covers
π» π-covers π over distr. π· if ββ β π» s.t. ππππ,π· β β€ π
π» is a distribution-independent (DI) π-cover for πΆ if
βπ β πΆ and distr. π·, π» π-covers π over distr. π·
COVER πΆ = min log π»
π» is DI 1/4-cover for πΆ}
Thm: SCDP πΆ = O(COVER πΆ ) [KLNRS 08, BKN 10]
Proof: exponential mechanism [MT 07]
Randomized DI covers
Let π· be a distribution over sets of hypotheses
π· is a DI (π, πΏ)-cover for πΆ if βπ β πΆ and π·,
Pr π» πβcovers π over π· β₯ 1 β πΏ
π»βΌπ·
size(π·)=
max
π»βsupp(π·)
|π»|
RCOVER πΆ = min log size π·
π· is DI
1 1
,
4 4
-cover for πΆ}
RCOVER πΆ = Ξ(SCDP πΆ ) [BNS 13a]
From covers to CC
βπ β πΆ and distr. π·, ββ β π» s.t. ππππ,π· β β€ 1/4
von Neumann minimax
βπ β πΆ, β distribution π‘π over π» s.t. βπ₯ β π
Pr β π₯ β π π₯ β€ 1/4
ββΌπ‘π
πβπΆ
β βΌ π‘π
π₯βπ
β
β(π₯)
CC π = log |π»|
βπ β πΆ, π₯ β π, Pr β π₯ β π π₯
β€ 1/4
From covers to CC
COVER πΆ = Ξ πΆπΆ β EvalβπΆ
RCOVER πΆ = Ξ πΆπΆ β,pub EvalβπΆ
πΆπΆ β EvalβπΆ β€ πΆπΆ β,pub EvalβπΆ + π(loglog πΆ π )
[N 91]
Lower bound tools
Information theory [BJKS 02]
1. Find hard distribution over inputs to Eval-πΆ
2. Low communication β low (mutual) information
3. Low information β large error
π₯β
Augmented Index
[BJKK 04, BIPW 10]
π₯0
0 1 0 0 0 1 0 1 0 1
0 1 0 0 0
π₯0..0
π0..00
π0..01
π₯0..1
π0..10
π0..11
π₯1
π₯1..0
π1..00 π1..01
LDIM(πΆ)
mistake tree
π₯1..1
π1..10 π1..11
Our results: upper bounds
Relaxed (πΌ, π½)-differential privacy
π΄ is (πΌ, π½)-differentially private if for any two data sets π, πβ
such that Ξ π, π β² = 1:
βπ β range π΄ , Pr π΄ π β π β€ π πΌ β
Pr π΄ πβ² β π + π½
π΄
π΄
βπ
SCDPπ½ Thrπ = π(16log
log (1/π½)) [BNS 13b]
SCDPπ½ Lineπ = π(log (1/π½))
An efficient πΌ, π½ -DP algo that learns Lineπ using
π(log (1/π½)/(πΌπ)) examples
Conclusions and open problems
1. Characterization in terms of communication
1.
2.
Tools from information theory
Additional applications
¿ Is sample complexity of πΌ, π½ -diff. private learning different from
VCDIM?
¿ What is the sample complexity of efficient DP learning of HSππ ?