Sample Complexity Bounds on Differentially Private Learning via Communication Complexity Vitaly Feldman IBM Research β Almaden David Xiao CNRS, Universite Paris 7 ITA, 2015 Learning model Learner has π i.i.d. examples: π₯1 , π¦1 , β¦ , π₯π , π¦π over π × {0,1} PAC model [V 84]: each π₯π βΌ π· and π¦π = π(π₯π ) for unknown π· and π β πΆ For every π β πΆ, π·, π > 0, given examples, with prob. β₯ 3/4 output β: ππππ,π· β = Pr π π₯ β π₯βΌπ· Privacy Each example is created from personal data of an individual (π₯π , π¦π ) = (GTTCACGβ¦TC, βYESβ) Differential Privacy [DMNS 06] (Randomized) algorithm π΄ is πΌ-differentially private if for any two data sets π, πβ such that Ξ π, π β² = 1: βπ β range π΄ , Pr π΄ π β π β€ π πΌ β Pr π΄ πβ² β π π΄ π΄ What is the cost of privacy? SCDP(πΆ) = sample complexity of PAC learning πΆ with π = 1/4 and 1-differential privacy Thrπ VCDIM(πΆ) Points SCDP(πΆ) Points: πΌππ·π π β π}; πΌππ·π π₯ = 1 iff π₯ = π SCDP Points = π(1) [F 09, BKN 10] Thrπ : π = 1, β¦ , 2π ; πΊππ π₯ = 1 iff π₯ β₯ π ; Thrπ = πΊππ π β π}; VCDIM Thrπ = 1; log Thrπ = π log|πΆ| [KLNRS 08] Our results: lower bounds Thrπ VCDIM(πΆ) Lineπ LDIM(πΆ) Point π SCDP(πΆ) log|πΆ| [KLNRS 08] LDIM(πΆ): Littlestoneβs dimension. Number of mistakes in online learning Corollaries: SCDP Thrπ = π(π) [L 87] For HSππ = linear separators over 1, β¦ , 2π SCDP HSππ = Ξ©(π 2 π) [MT 94] π Lineπ : π = π2π , Lineπ = π βπ, π β ππ such that π π₯, π¦ = 1 iff π¦ β‘ ππ₯ + π mod π} LDIM Lineπ = 2; SCDP Lineπ = Ξ(log π) Our results: characterization πβπΆ π₯βπ π Alicia Roberto π§ βπ₯ β π, π β πΆ, Pr π§ β π π₯ Eval-πΆ: πΆ × π β 0,1 with EvalβπΆ π, π₯ = π(π₯) Private coins: πΆπΆ β EvalβπΆ Public coins: πΆπΆ β,pub EvalβπΆ SCDP πΆ = Ξ πΆπΆ β,pub EvalβπΆ β€ 1/4 Related results Distributional assumptions/Label privacy only/Count only labeled β’ Ξ VCDIM πΆ [CH 11, BNS 15] Characterization in terms of distribution independent covers: β’ SCDP πΆ = Ξ RCOVER πΆ [BNS 13a] Distribution-independent covers π» π-covers π over distr. π· if ββ β π» s.t. ππππ,π· β β€ π π» is a distribution-independent (DI) π-cover for πΆ if βπ β πΆ and distr. π·, π» π-covers π over distr. π· COVER πΆ = min log π» π» is DI 1/4-cover for πΆ} Thm: SCDP πΆ = O(COVER πΆ ) [KLNRS 08, BKN 10] Proof: exponential mechanism [MT 07] Randomized DI covers Let π· be a distribution over sets of hypotheses π· is a DI (π, πΏ)-cover for πΆ if βπ β πΆ and π·, Pr π» πβcovers π over π· β₯ 1 β πΏ π»βΌπ· size(π·)= max π»βsupp(π·) |π»| RCOVER πΆ = min log size π· π· is DI 1 1 , 4 4 -cover for πΆ} RCOVER πΆ = Ξ(SCDP πΆ ) [BNS 13a] From covers to CC βπ β πΆ and distr. π·, ββ β π» s.t. ππππ,π· β β€ 1/4 von Neumann minimax βπ β πΆ, β distribution π‘π over π» s.t. βπ₯ β π Pr β π₯ β π π₯ β€ 1/4 ββΌπ‘π πβπΆ β βΌ π‘π π₯βπ β β(π₯) CC π = log |π»| βπ β πΆ, π₯ β π, Pr β π₯ β π π₯ β€ 1/4 From covers to CC COVER πΆ = Ξ πΆπΆ β EvalβπΆ RCOVER πΆ = Ξ πΆπΆ β,pub EvalβπΆ πΆπΆ β EvalβπΆ β€ πΆπΆ β,pub EvalβπΆ + π(loglog πΆ π ) [N 91] Lower bound tools Information theory [BJKS 02] 1. Find hard distribution over inputs to Eval-πΆ 2. Low communication β low (mutual) information 3. Low information β large error π₯β Augmented Index [BJKK 04, BIPW 10] π₯0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 π₯0..0 π0..00 π0..01 π₯0..1 π0..10 π0..11 π₯1 π₯1..0 π1..00 π1..01 LDIM(πΆ) mistake tree π₯1..1 π1..10 π1..11 Our results: upper bounds Relaxed (πΌ, π½)-differential privacy π΄ is (πΌ, π½)-differentially private if for any two data sets π, πβ such that Ξ π, π β² = 1: βπ β range π΄ , Pr π΄ π β π β€ π πΌ β Pr π΄ πβ² β π + π½ π΄ π΄ βπ SCDPπ½ Thrπ = π(16log log (1/π½)) [BNS 13b] SCDPπ½ Lineπ = π(log (1/π½)) An efficient πΌ, π½ -DP algo that learns Lineπ using π(log (1/π½)/(πΌπ)) examples Conclusions and open problems 1. Characterization in terms of communication 1. 2. Tools from information theory Additional applications ¿ Is sample complexity of πΌ, π½ -diff. private learning different from VCDIM? ¿ What is the sample complexity of efficient DP learning of HSππ ?