slides

Preserving Statistical Validity
in Adaptive Data Analysis
Vitaly Feldman
IBM Research - Almaden
Cynthia Dwork
Microsoft Res.
Moritz Hardt Toni Pitassi Omer Reingold
Google Res.
U. of Toronto
Samsung Res.
Aaron Roth
Penn, CS
Population
𝑃 over 𝑋
Dataset
Param. estimates
Correlations
Predictive model
Classifier,
Clustering
etc.
Findings
𝑆 = π‘₯1 , … , π‘₯𝑛 ∼ 𝑃𝑛
Analysis
Data Science 101
Does student nutrition affect academic performance?
Normalized grade
50
100
0.13
-0.44
-0.10
-0.38
-0.16
1.31
1.19
0.81
1.51
1.46
0.31
-1.70
-0.20
-0.42
-0.78
0.15
1.20
-0.76
0.59
-0.50
0.39
-0.82
0.89
-0.69
0.08
-0.41
0.06
-0.78
0.45
-1.40
0.44
0.73
-0.72
0.16
0.34
0.39
-1.21
-0.52
-0.60
-0.35
0.97
-0.77
0.11
0.19
0.20
-0.69
0.45
1.18
-0.30
0.00
0.98
1.95
0.10
-1.83
-0.74
-0.02
-1.53
-1.13
1.86
0.54
-0.11
-0.21
1.06
0.30
-0.31
-1.61
-1.42
0.00
1.10
1.13
-0.49
0.44
0.16
1.31
-0.41
2.03
-0.05
0.34
0.69
-1.25
-0.28
0.40
-0.39
-1.00
-0.01
1.46
-0.28
1.42
-0.64
1.10
0.05
1.52
0.42
1.23
0.13
-0.96
0.28
-0.55
0.99
-1.04
-2.30
-0.51
0.03
0.25
0.50
-0.45
-0.03
-1.17
0.20
0.04
1.07
-1.00
-1.93
-1.09
1.38
-1.02
-0.87
-1.57
-0.49
0.68
2.14
-0.13
-0.31
0.31
1.87
-0.18
-0.40
-0.53
-0.45
0.07
-0.18
1.23
0.36
0.11
0.00
-0.71
-0.82
1.81
-1.24
-0.88
-1.06
0.55
1.09
-1.73
2.11
-0.40
-0.18
-1.21
1.33
1.68
0.43
-0.73
-0.96
0.01
-1.35
-0.97
-2.13
0.93
-2.07
-0.94
0.58
0.16
-0.31
1.76
-0.39
1.03
0.71
-0.70
0.87
-0.34
1.11
0.90
1.35
0.27
-2.28
0.14
0.29
-0.47
0.42
1.64
-0.22
0.06
-0.03
0.20
0.35
0.56
-0.97
0.30
1.86
1.87
-1.99
-1.30
0.62
0.36
2.42
0.14
-0.53
2.43
1.16
1.21
-0.50
0.88
1.23
-1.33
0.63
0.85
0.05
-0.49
0.17
-1.18
-0.20
0.89
-0.05
2.05
0.49
1.50
-0.68
1.03
1.94
-1.16
-2.51
0.17
0.08
0.70
0.43
-0.31
0.63
-0.78
0.27
-1.79
0.15
-0.09
-0.47
-0.48
-1.04
1.03
0.61
2.78
-0.62
-1.92
-1.28
-0.53
-0.68
-0.67
2.18
-1.35
0.13
-0.57
-0.24
1.88
-0.13
-0.28
-0.05
0.43
1.07
2.36
-1.14
-0.76
-0.64
-0.80
-0.21
-0.24
-1.35
0.87
-0.28
-1.26
-1.19
-0.53
0.87
-1.40
0.23
0.19
-0.74
-1.83
-0.29
-1.76
-0.45
-0.65
1.96
-0.13
0.03
-1.07
-0.58
1.14
-1.51
0.41
1.43
1.86
0.22
-2.31
1.50
1.45
-2.07
0.46
-1.67
-1.23
1.46
0.88
-0.20
-0.59
-0.90
-0.98
-0.14
0.39
-0.79
-1.73
-0.49
-0.74
-0.74
0.43
-0.63
-2.26
0.96
-0.40
0.03
-1.20
0.02
-1.97
0.45
0.28
-1.59
0.46
-0.29
1.04
0.94
0.08
-0.96
-0.73
1.89
-0.97
0.14
0.87
0.12
-0.29
-0.11
1.12
-0.66
1.31
-0.96
0.36
-1.50
-0.01
1.30
-0.57
-1.12
0.64
0.05
0.46
1.30
0.71
-0.53
0.20
-1.20
-0.85
-1.90
-1.69
1.21
-0.72
0.16
1.29
0.74
0.49
-0.13
0.58
-1.04
-0.34
-2.09
0.39
-0.38
-0.89
-1.55
-0.10
0.04
-0.27
-0.45
-0.56
-0.94
-2.54
0.63
-1.17
1.26
-0.05
-0.28
-0.61
1.39
0.01
0.96
-1.82
0.45
0.84
-1.17
-1.06
-0.82
0.23
-0.91
-0.28
-1.21
-0.08
1.00
-0.18
0.17
0.41
0.66
-0.54
-0.20
0.38
0.91
0.31
1.03
1.01
0.08
-1.34
0.81
-0.74
1.30
-2.55
1.47
-0.10
-0.98
0.36
-1.37
0.99
-0.31
0.20
0.65
-1.31
0.66
0.99
1.32
-0.03
-0.54
-0.23
-1.47
2.11
0.14
0.80
0.82
-0.79
2.01
2.05
0.91
1.67
-0.09
1.22
0.32
-0.48
0.71
-0.33
-0.30
-0.09
0.65
0.89
1.72
-0.55
0.01
-1.42
0.71
2.03
0.06
-0.50
0.27
0.19
-1.20
0.38
0.13
-0.20
1.10
-0.77
0.65
-0.04
-0.50
-0.75
-0.14
0.86
1.02
0.05
-0.01
-0.90
1.69
-0.74
0.42
0.57
-0.74
-1.69
-0.44
0.16
-0.39
-0.39
-0.76
-0.46
-0.58
-1.37
-0.53
-1.49
-0.10
0.24
1.03
0.65
0.48
-1.83
1.90
1.60
0.70
-0.40
1.11
-1.27
-0.38
-0.82
1.91
0.00
-0.63
-0.64
1.22
-0.25
-1.03
1.17
-0.33
0.97
-0.17
0.12
-0.11
0.90
0.66
0.29
-0.45
-1.45
0.22
2.01
0.48
-0.42
0.52
-0.68
-1.24
0.48
0.02
0.47
-0.83
1.04
-1.03
1.75
0.44
0.17
-0.02
0.28
-0.66
-2.03
-1.51
-0.31
0.35
-1.97
2.81
-0.53
1.87
0.13
-0.74
-0.84
-1.87
-0.40
-1.16
-1.07
1.10
2.61
-0.62
-0.37
-0.59
0.23
-0.39
0.65
1.70
0.38
-0.38
0.02
0.03
2.03
-0.55
2.13
-0.56
-0.32
-0.81
-1.55
1.45
-0.31
-0.54
0.30
0.97
-0.02
-0.20
-0.40
-1.54
1.40
1.11
-0.76
0.81
-1.21
-0.57
-1.76
0.79
0.69
1.31
1.18
0.63
-0.89
-0.11
1.19
-0.27
-0.43
-0.82
-1.12
0.57
-1.10
0.16
0.22
-1.40
-0.23
-0.79
0.90
-0.13
-1.94
0.78
-0.94
-0.54
0.98
1.64
0.87
0.36
1.24
1.23
-1.34
-0.87
1.56
-1.54
-1.11
0.60
0.12
-0.77
-0.89
0.07
-0.90
-1.43
-0.09
1.21
-0.19
-0.87
-1.03
0.28
0.62
0.40
0.49
1.01
-0.90
0.00
-0.56
0.34
1.18
0.38
-0.53
-1.38
1.30
1.50
-2.33
-0.53
0.18
1.33
-0.47
-1.28
1.56
-0.90
2.50
0.40
-1.10
0.31
0.66
1.13
-0.12
-0.31
0.79
0.17
-0.01
-2.36
-0.80
1.80
-0.01
0.06
3.01
1.15
0.73
-2.63
1.94
0.71
1.77
0.31
-1.23
0.09
1.70
1.08
-0.08
-0.43
-2.02
-1.18
-1.26
-0.36
0.82
0.82
0.46
0.92
0.07
0.36
0.45
0.05
0.34
-0.13
0.11
-0.97
Check correlations
Correlations with grade
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
-0.1
-0.2
-0.3
-0.4
Pick candidate foods
Correlations with grade
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
-0.1
-0.2
-0.3
-0.4
Fit linear function of 3 selected foods
True vs Predicted Grade
SUMMARY OUTPUT
1.5
Regression Statistics
Multiple R
0.4453533
R Square
0.1983396
Adjusted R Square 0.1732877
Standard Error
1.0041891
Observations
100
1
0.5
0
-4
-3
-2
0
1
2
3
4
-1.5
df
Intercept
Mushroom
Pumpkin
Nutella
-0.5
-1
ANOVA
Regression
Residual
Total
-1
3
96
99
SS
MS
F
23.95086544 7.983622 7.917151
96.80600126 1.008396
120.7568667
Coefficient
s
Standard Error
-0.044248 0.100545016
-0.296074
0.10193011
0.255769 0.108443069
0.2671363 0.095186165
t Stat
-0.44008
-2.90468
2.358555
2.806462
P-value
0.660868
0.004563
0.020373
0.006066
Significance F
8.98706E-05
β‰ˆ 0.0001 < 0.05
True 𝑃:
π‘₯𝑖 , 𝑦 ∼ 𝑁(0,1)
Uncorrelated
Freedman’s Paradox:
β€œSuch practices can distort the significance levels of
conventional statistical tests. The existence of this effect
is well known, but its magnitude may come as a surprise, even
to a hardened statistician.”
(1983)
Statistical inference
β€œFresh” data
Data
Data
Data
Procedure
Hypothesis tests
Regression
Learning
Result and
statistical
guarantees
p-values
confidence intervals
prediction intervals
Data analysis is adaptive
β€’
β€’
β€’
β€’
Exploratory data analysis
Variable selection
Hyper-parameter tuning
Shared data - findings
inform others
Result
Result
Data
Data
Data
Result
Result
Is this a real problem?
β€œWhy Most Published Research
Findings Are False” [Ioannidis 05]
1,000,000+ downloads; 1400+ citations
β€œIrreproducible preclinical research exceeds 50%,
resulting in approximately US$28B/year loss”
[Freedman,Cockburn,Simcoe 15]
Adaptive data analysis is one of the causes
In the course of collecting and analyzing data, researchers have many
decisions to make […] It is rare, and sometimes impractical, for researchers
to make all these decisions beforehand. Rather, it is common (and accepted
practice) for researchers to explore various analytic alternatives, to search
for a combination that yields β€œstatistical significance”, and to then report
only what β€œworked”.
[Simmons,Nelson,Simonsohn 11]
Evaluating adaptive queries
πœ™1
𝑣1
πœ™2
𝑣2
πœ™π‘š
π‘£π‘š
𝑆
Statistical query 𝑛
𝑆 = π‘₯1oracle
, … , π‘₯𝑛 ∼ 𝑃
[Kearns 93]
Data analyst(s)
πœ™π‘– : 𝑋 β†’ 0,1
𝑣𝑖 βˆ’ 𝐄π‘₯βˆΌπ‘ƒ πœ™π‘– π‘₯ ≀ Ο„
with prob. 1 βˆ’ 𝛽
Can measure correlations, moments, accuracy/error,
parameters and run any SQ-based algorithm!
Answering non-adaptive SQs
Given π‘š non-adaptive query functions πœ™1 , … πœ™π‘š and 𝑛
i.i.d. samples from 𝑃 estimate 𝐄π‘₯βˆΌπ‘ƒ πœ™π‘– π‘₯
Use empirical mean: 𝐄π‘₯βˆΌπ‘† πœ™π‘– π‘₯
=
1
𝑛
log (π‘š/𝛽)
𝑛=𝑂
𝜏2
π‘₯βˆˆπ‘†
πœ™π‘– π‘₯
Answering adaptive SQs
Estimate expectations of π‘š adaptively chosen functions?
What if use 𝐄𝑆 πœ™π‘– ?
For some constant 𝛽 > 0,𝜏 > 0:
𝑛β‰₯π‘š
e.g. Freedman’s paradox
Safe:
𝑛=𝑂
π‘šβ‹…log (π‘š/𝛽)
𝜏2
Our results
Exists an algorithm that can answer 𝑑 adaptively chosen SQs
with accuracy 𝜏 for
log 3/2 π‘š β‹… log 𝑋
log π‘š log π‘š β‹… log 𝑋
𝑛1 = 𝑂
=𝑂
β‹…
𝜏 3.5
𝜏2
𝜏 1.5
In time 𝑂(𝑛1 𝑋 )
Exists an algorithm that can answer 𝑑 adaptively chosen SQs
with accuracy 𝜏 for
π‘š β‹… log 3/2 π‘š
𝑛2 = 𝑂
𝜏 2.5
In time 𝑂(𝑛2 log 𝑋 )
Previously:
π‘šβ‹… log π‘š
𝑂
2
𝜏
Tool: differential privacy
Differential Privacy
S
Cynthia
Frank
[Dwork,McSherry,Nissim,Smith 06]
Aaron
Chris
Kobbi
Adam
Algorithm
ratio bounded
Pr[π‘Ÿ]
𝐴
(Randomized) algorithm 𝐴 is (πœ–, 𝛿)-differentially private
if for any two data sets 𝑆, 𝑆′ such that Ξ” 𝑆, 𝑆 β€² = 1:
βˆ€π‘ βŠ† range 𝐴 ,
Pr 𝐴 𝑆 ∈ 𝑍 ≀ 𝑒 πœ– β‹… Pr 𝐴 𝑆 β€² ∈ 𝑍 + 𝛿
𝐴
𝐴
Why DP?
DP composes
adaptively
𝑋𝑛
A
π‘Œ
πœ–1 , 𝛿1
π‘Œ
𝑋𝑛
B
πœ–2 , 𝛿2
π‘Œβ€²
Why DP?
DP composes
adaptively
𝑋𝑛
A
B
π‘Œβ€²
πœ–1 + πœ–2 , 𝛿1 + 𝛿2
Why DP?
DP composes
adaptively
DP implies
generalization
Composition of π‘š πœ–-DP algorithms: for
every 𝛿 > 0, is πœ– 2π‘š log 1/𝛿 , 𝛿 -DP
[Dwork,Rothblum,Vadhan 10]
If a 𝜏-DP algorithm 𝐴 outputs
a function πœ™:𝑋 β†’ 0,1 then
2
Pr 𝑛 𝐄𝑆 𝐴 𝑆 βˆ’ 𝐄𝑃 𝐴 𝑆 β‰₯ 𝜏 ≀ 6𝑒 βˆ’πœ 𝑛
𝐴,π‘†βˆΌπ‘ƒ
Similarly, for 𝜏, 𝑒
Folklore:
𝐄
𝐴,π‘†βˆΌπ‘ƒπ‘›
1
𝜏
βˆ’
𝐄𝑆 𝐴 𝑆
-DP algorithms
βˆ’ 𝐄𝑃 𝐴 𝑆
β‰€πœ
Back to queries
If a DP algorithm 𝐴 outputs a function then
𝐄𝑆 𝐴(𝑆) β‰ˆ 𝐄𝑃 [𝐴(𝑆)] with high prob.
1. How to make the analyst differentially private?
Answer queries with DP! Post-processing preserves DP
2. How to answer queries for 𝐄𝑆 𝐴(𝑆) with DP?
Counting queries
Laplace noise mechanism [DMNS 06, DRV 10]
β€’ 𝑛2 = 𝑂
π‘šβ‹…log3/2 π‘š
𝜏2.5
Multiplicative weights based mechanism [Hardt,Rothblum 10]
β€’ 𝑛1 = 𝑂
log3/2 π‘šβ‹… log 𝑋
𝜏3.5
Further developments
β€’ 𝑛 = Ξ© π‘š/𝜏 for poly-time algorithms assuming OWF exists
[Hardt,Ullman 14; Steinke,Ullman 14]
β€’ 𝑛 = O π‘š β‹… log π‘š /𝜏 2 and more general low-sensitivity
queries [Bassily,Smith,Steinke,Ullman 15]
β€’ Stronger (tight) generalization for (πœ–, 𝛿)-DP algorithms
[Nissim,Stemmer 15]
β€’ General adaptive setting and other techniques: description
length and max-information [DFHPRR 15]
β€’ Application: algorithms for reusing holdout set in ML
[DFHPRR 15]
β€’ Application to maintaining accurate leaderboard in ML
competitions [Blum,Hardt 15]
Ask authors! Watch TCS+ talk by Aaron