Preserving Statistical Validity in Adaptive Data Analysis Vitaly Feldman IBM Research - Almaden Cynthia Dwork Microsoft Res. Moritz Hardt Toni Pitassi Omer Reingold Google Res. U. of Toronto Samsung Res. Aaron Roth Penn, CS Population π over π Dataset Param. estimates Correlations Predictive model Classifier, Clustering etc. Findings π = π₯1 , β¦ , π₯π βΌ ππ Analysis Data Science 101 Does student nutrition affect academic performance? Normalized grade 50 100 0.13 -0.44 -0.10 -0.38 -0.16 1.31 1.19 0.81 1.51 1.46 0.31 -1.70 -0.20 -0.42 -0.78 0.15 1.20 -0.76 0.59 -0.50 0.39 -0.82 0.89 -0.69 0.08 -0.41 0.06 -0.78 0.45 -1.40 0.44 0.73 -0.72 0.16 0.34 0.39 -1.21 -0.52 -0.60 -0.35 0.97 -0.77 0.11 0.19 0.20 -0.69 0.45 1.18 -0.30 0.00 0.98 1.95 0.10 -1.83 -0.74 -0.02 -1.53 -1.13 1.86 0.54 -0.11 -0.21 1.06 0.30 -0.31 -1.61 -1.42 0.00 1.10 1.13 -0.49 0.44 0.16 1.31 -0.41 2.03 -0.05 0.34 0.69 -1.25 -0.28 0.40 -0.39 -1.00 -0.01 1.46 -0.28 1.42 -0.64 1.10 0.05 1.52 0.42 1.23 0.13 -0.96 0.28 -0.55 0.99 -1.04 -2.30 -0.51 0.03 0.25 0.50 -0.45 -0.03 -1.17 0.20 0.04 1.07 -1.00 -1.93 -1.09 1.38 -1.02 -0.87 -1.57 -0.49 0.68 2.14 -0.13 -0.31 0.31 1.87 -0.18 -0.40 -0.53 -0.45 0.07 -0.18 1.23 0.36 0.11 0.00 -0.71 -0.82 1.81 -1.24 -0.88 -1.06 0.55 1.09 -1.73 2.11 -0.40 -0.18 -1.21 1.33 1.68 0.43 -0.73 -0.96 0.01 -1.35 -0.97 -2.13 0.93 -2.07 -0.94 0.58 0.16 -0.31 1.76 -0.39 1.03 0.71 -0.70 0.87 -0.34 1.11 0.90 1.35 0.27 -2.28 0.14 0.29 -0.47 0.42 1.64 -0.22 0.06 -0.03 0.20 0.35 0.56 -0.97 0.30 1.86 1.87 -1.99 -1.30 0.62 0.36 2.42 0.14 -0.53 2.43 1.16 1.21 -0.50 0.88 1.23 -1.33 0.63 0.85 0.05 -0.49 0.17 -1.18 -0.20 0.89 -0.05 2.05 0.49 1.50 -0.68 1.03 1.94 -1.16 -2.51 0.17 0.08 0.70 0.43 -0.31 0.63 -0.78 0.27 -1.79 0.15 -0.09 -0.47 -0.48 -1.04 1.03 0.61 2.78 -0.62 -1.92 -1.28 -0.53 -0.68 -0.67 2.18 -1.35 0.13 -0.57 -0.24 1.88 -0.13 -0.28 -0.05 0.43 1.07 2.36 -1.14 -0.76 -0.64 -0.80 -0.21 -0.24 -1.35 0.87 -0.28 -1.26 -1.19 -0.53 0.87 -1.40 0.23 0.19 -0.74 -1.83 -0.29 -1.76 -0.45 -0.65 1.96 -0.13 0.03 -1.07 -0.58 1.14 -1.51 0.41 1.43 1.86 0.22 -2.31 1.50 1.45 -2.07 0.46 -1.67 -1.23 1.46 0.88 -0.20 -0.59 -0.90 -0.98 -0.14 0.39 -0.79 -1.73 -0.49 -0.74 -0.74 0.43 -0.63 -2.26 0.96 -0.40 0.03 -1.20 0.02 -1.97 0.45 0.28 -1.59 0.46 -0.29 1.04 0.94 0.08 -0.96 -0.73 1.89 -0.97 0.14 0.87 0.12 -0.29 -0.11 1.12 -0.66 1.31 -0.96 0.36 -1.50 -0.01 1.30 -0.57 -1.12 0.64 0.05 0.46 1.30 0.71 -0.53 0.20 -1.20 -0.85 -1.90 -1.69 1.21 -0.72 0.16 1.29 0.74 0.49 -0.13 0.58 -1.04 -0.34 -2.09 0.39 -0.38 -0.89 -1.55 -0.10 0.04 -0.27 -0.45 -0.56 -0.94 -2.54 0.63 -1.17 1.26 -0.05 -0.28 -0.61 1.39 0.01 0.96 -1.82 0.45 0.84 -1.17 -1.06 -0.82 0.23 -0.91 -0.28 -1.21 -0.08 1.00 -0.18 0.17 0.41 0.66 -0.54 -0.20 0.38 0.91 0.31 1.03 1.01 0.08 -1.34 0.81 -0.74 1.30 -2.55 1.47 -0.10 -0.98 0.36 -1.37 0.99 -0.31 0.20 0.65 -1.31 0.66 0.99 1.32 -0.03 -0.54 -0.23 -1.47 2.11 0.14 0.80 0.82 -0.79 2.01 2.05 0.91 1.67 -0.09 1.22 0.32 -0.48 0.71 -0.33 -0.30 -0.09 0.65 0.89 1.72 -0.55 0.01 -1.42 0.71 2.03 0.06 -0.50 0.27 0.19 -1.20 0.38 0.13 -0.20 1.10 -0.77 0.65 -0.04 -0.50 -0.75 -0.14 0.86 1.02 0.05 -0.01 -0.90 1.69 -0.74 0.42 0.57 -0.74 -1.69 -0.44 0.16 -0.39 -0.39 -0.76 -0.46 -0.58 -1.37 -0.53 -1.49 -0.10 0.24 1.03 0.65 0.48 -1.83 1.90 1.60 0.70 -0.40 1.11 -1.27 -0.38 -0.82 1.91 0.00 -0.63 -0.64 1.22 -0.25 -1.03 1.17 -0.33 0.97 -0.17 0.12 -0.11 0.90 0.66 0.29 -0.45 -1.45 0.22 2.01 0.48 -0.42 0.52 -0.68 -1.24 0.48 0.02 0.47 -0.83 1.04 -1.03 1.75 0.44 0.17 -0.02 0.28 -0.66 -2.03 -1.51 -0.31 0.35 -1.97 2.81 -0.53 1.87 0.13 -0.74 -0.84 -1.87 -0.40 -1.16 -1.07 1.10 2.61 -0.62 -0.37 -0.59 0.23 -0.39 0.65 1.70 0.38 -0.38 0.02 0.03 2.03 -0.55 2.13 -0.56 -0.32 -0.81 -1.55 1.45 -0.31 -0.54 0.30 0.97 -0.02 -0.20 -0.40 -1.54 1.40 1.11 -0.76 0.81 -1.21 -0.57 -1.76 0.79 0.69 1.31 1.18 0.63 -0.89 -0.11 1.19 -0.27 -0.43 -0.82 -1.12 0.57 -1.10 0.16 0.22 -1.40 -0.23 -0.79 0.90 -0.13 -1.94 0.78 -0.94 -0.54 0.98 1.64 0.87 0.36 1.24 1.23 -1.34 -0.87 1.56 -1.54 -1.11 0.60 0.12 -0.77 -0.89 0.07 -0.90 -1.43 -0.09 1.21 -0.19 -0.87 -1.03 0.28 0.62 0.40 0.49 1.01 -0.90 0.00 -0.56 0.34 1.18 0.38 -0.53 -1.38 1.30 1.50 -2.33 -0.53 0.18 1.33 -0.47 -1.28 1.56 -0.90 2.50 0.40 -1.10 0.31 0.66 1.13 -0.12 -0.31 0.79 0.17 -0.01 -2.36 -0.80 1.80 -0.01 0.06 3.01 1.15 0.73 -2.63 1.94 0.71 1.77 0.31 -1.23 0.09 1.70 1.08 -0.08 -0.43 -2.02 -1.18 -1.26 -0.36 0.82 0.82 0.46 0.92 0.07 0.36 0.45 0.05 0.34 -0.13 0.11 -0.97 Check correlations Correlations with grade 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 -0.1 -0.2 -0.3 -0.4 Pick candidate foods Correlations with grade 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 -0.1 -0.2 -0.3 -0.4 Fit linear function of 3 selected foods True vs Predicted Grade SUMMARY OUTPUT 1.5 Regression Statistics Multiple R 0.4453533 R Square 0.1983396 Adjusted R Square 0.1732877 Standard Error 1.0041891 Observations 100 1 0.5 0 -4 -3 -2 0 1 2 3 4 -1.5 df Intercept Mushroom Pumpkin Nutella -0.5 -1 ANOVA Regression Residual Total -1 3 96 99 SS MS F 23.95086544 7.983622 7.917151 96.80600126 1.008396 120.7568667 Coefficient s Standard Error -0.044248 0.100545016 -0.296074 0.10193011 0.255769 0.108443069 0.2671363 0.095186165 t Stat -0.44008 -2.90468 2.358555 2.806462 P-value 0.660868 0.004563 0.020373 0.006066 Significance F 8.98706E-05 β 0.0001 < 0.05 True π: π₯π , π¦ βΌ π(0,1) Uncorrelated Freedmanβs Paradox: βSuch practices can distort the signiο¬cance levels of conventional statistical tests. The existence of this effect is well known, but its magnitude may come as a surprise, even to a hardened statistician.β (1983) Statistical inference βFreshβ data Data Data Data Procedure Hypothesis tests Regression Learning Result and statistical guarantees p-values confidence intervals prediction intervals Data analysis is adaptive β’ β’ β’ β’ Exploratory data analysis Variable selection Hyper-parameter tuning Shared data - findings inform others Result Result Data Data Data Result Result Is this a real problem? βWhy Most Published Research Findings Are Falseβ [Ioannidis 05] 1,000,000+ downloads; 1400+ citations βIrreproducible preclinical research exceeds 50%, resulting in approximately US$28B/year lossβ [Freedman,Cockburn,Simcoe 15] Adaptive data analysis is one of the causes In the course of collecting and analyzing data, researchers have many decisions to make [β¦] It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields βstatistical significanceβ, and to then report only what βworkedβ. [Simmons,Nelson,Simonsohn 11] Evaluating adaptive queries π1 π£1 π2 π£2 ππ π£π π Statistical query π π = π₯1oracle , β¦ , π₯π βΌ π [Kearns 93] Data analyst(s) ππ : π β 0,1 π£π β ππ₯βΌπ ππ π₯ β€ Ο with prob. 1 β π½ Can measure correlations, moments, accuracy/error, parameters and run any SQ-based algorithm! Answering non-adaptive SQs Given π non-adaptive query functions π1 , β¦ ππ and π i.i.d. samples from π estimate ππ₯βΌπ ππ π₯ Use empirical mean: ππ₯βΌπ ππ π₯ = 1 π log (π/π½) π=π π2 π₯βπ ππ π₯ Answering adaptive SQs Estimate expectations of π adaptively chosen functions? What if use ππ ππ ? For some constant π½ > 0,π > 0: πβ₯π e.g. Freedmanβs paradox Safe: π=π πβ log (π/π½) π2 Our results Exists an algorithm that can answer π‘ adaptively chosen SQs with accuracy π for log 3/2 π β log π log π log π β log π π1 = π =π β π 3.5 π2 π 1.5 In time π(π1 π ) Exists an algorithm that can answer π‘ adaptively chosen SQs with accuracy π for π β log 3/2 π π2 = π π 2.5 In time π(π2 log π ) Previously: πβ log π π 2 π Tool: differential privacy Differential Privacy S Cynthia Frank [Dwork,McSherry,Nissim,Smith 06] Aaron Chris Kobbi Adam Algorithm ratio bounded Pr[π] π΄ (Randomized) algorithm π΄ is (π, πΏ)-differentially private if for any two data sets π, πβ² such that Ξ π, π β² = 1: βπ β range π΄ , Pr π΄ π β π β€ π π β Pr π΄ π β² β π + πΏ π΄ π΄ Why DP? DP composes adaptively ππ A π π1 , πΏ1 π ππ B π2 , πΏ2 πβ² Why DP? DP composes adaptively ππ A B πβ² π1 + π2 , πΏ1 + πΏ2 Why DP? DP composes adaptively DP implies generalization Composition of π π-DP algorithms: for every πΏ > 0, is π 2π log 1/πΏ , πΏ -DP [Dwork,Rothblum,Vadhan 10] If a π-DP algorithm π΄ outputs a function π:π β 0,1 then 2 Pr π ππ π΄ π β ππ π΄ π β₯ π β€ 6π βπ π π΄,πβΌπ Similarly, for π, π Folklore: π π΄,πβΌππ 1 π β ππ π΄ π -DP algorithms β ππ π΄ π β€π Back to queries If a DP algorithm π΄ outputs a function then ππ π΄(π) β ππ [π΄(π)] with high prob. 1. How to make the analyst differentially private? Answer queries with DP! Post-processing preserves DP 2. How to answer queries for ππ π΄(π) with DP? Counting queries Laplace noise mechanism [DMNS 06, DRV 10] β’ π2 = π πβ log3/2 π π2.5 Multiplicative weights based mechanism [Hardt,Rothblum 10] β’ π1 = π log3/2 πβ log π π3.5 Further developments β’ π = Ξ© π/π for poly-time algorithms assuming OWF exists [Hardt,Ullman 14; Steinke,Ullman 14] β’ π = O π β log π /π 2 and more general low-sensitivity queries [Bassily,Smith,Steinke,Ullman 15] β’ Stronger (tight) generalization for (π, πΏ)-DP algorithms [Nissim,Stemmer 15] β’ General adaptive setting and other techniques: description length and max-information [DFHPRR 15] β’ Application: algorithms for reusing holdout set in ML [DFHPRR 15] β’ Application to maintaining accurate leaderboard in ML competitions [Blum,Hardt 15] Ask authors! Watch TCS+ talk by Aaron