slides

Preserving Statistical Validity
in Adaptive Data Analysis
Vitaly Feldman
IBM Research - Almaden
Cynthia Dwork
Microsoft Res.
Moritz Hardt Toni Pitassi Omer Reingold
Google Res.
U. of Toronto
Samsung Res.
Aaron Roth
Penn, CS
Population
𝑃 over 𝑋
Dataset
Param. estimates
Correlations
Predictive model
Classifier,
Clustering
etc.
Findings
𝑆 = 𝑥1 , … , 𝑥𝑛 ∼ 𝑃𝑛
Analysis
Data Science 101
Does student nutrition affect academic performance?
Normalized grade
50
100
0.13
-0.44
-0.10
-0.38
-0.16
1.31
1.19
0.81
1.51
1.46
0.31
-1.70
-0.20
-0.42
-0.78
0.15
1.20
-0.76
0.59
-0.50
0.39
-0.82
0.89
-0.69
0.08
-0.41
0.06
-0.78
0.45
-1.40
0.44
0.73
-0.72
0.16
0.34
0.39
-1.21
-0.52
-0.60
-0.35
0.97
-0.77
0.11
0.19
0.20
-0.69
0.45
1.18
-0.30
0.00
0.98
1.95
0.10
-1.83
-0.74
-0.02
-1.53
-1.13
1.86
0.54
-0.11
-0.21
1.06
0.30
-0.31
-1.61
-1.42
0.00
1.10
1.13
-0.49
0.44
0.16
1.31
-0.41
2.03
-0.05
0.34
0.69
-1.25
-0.28
0.40
-0.39
-1.00
-0.01
1.46
-0.28
1.42
-0.64
1.10
0.05
1.52
0.42
1.23
0.13
-0.96
0.28
-0.55
0.99
-1.04
-2.30
-0.51
0.03
0.25
0.50
-0.45
-0.03
-1.17
0.20
0.04
1.07
-1.00
-1.93
-1.09
1.38
-1.02
-0.87
-1.57
-0.49
0.68
2.14
-0.13
-0.31
0.31
1.87
-0.18
-0.40
-0.53
-0.45
0.07
-0.18
1.23
0.36
0.11
0.00
-0.71
-0.82
1.81
-1.24
-0.88
-1.06
0.55
1.09
-1.73
2.11
-0.40
-0.18
-1.21
1.33
1.68
0.43
-0.73
-0.96
0.01
-1.35
-0.97
-2.13
0.93
-2.07
-0.94
0.58
0.16
-0.31
1.76
-0.39
1.03
0.71
-0.70
0.87
-0.34
1.11
0.90
1.35
0.27
-2.28
0.14
0.29
-0.47
0.42
1.64
-0.22
0.06
-0.03
0.20
0.35
0.56
-0.97
0.30
1.86
1.87
-1.99
-1.30
0.62
0.36
2.42
0.14
-0.53
2.43
1.16
1.21
-0.50
0.88
1.23
-1.33
0.63
0.85
0.05
-0.49
0.17
-1.18
-0.20
0.89
-0.05
2.05
0.49
1.50
-0.68
1.03
1.94
-1.16
-2.51
0.17
0.08
0.70
0.43
-0.31
0.63
-0.78
0.27
-1.79
0.15
-0.09
-0.47
-0.48
-1.04
1.03
0.61
2.78
-0.62
-1.92
-1.28
-0.53
-0.68
-0.67
2.18
-1.35
0.13
-0.57
-0.24
1.88
-0.13
-0.28
-0.05
0.43
1.07
2.36
-1.14
-0.76
-0.64
-0.80
-0.21
-0.24
-1.35
0.87
-0.28
-1.26
-1.19
-0.53
0.87
-1.40
0.23
0.19
-0.74
-1.83
-0.29
-1.76
-0.45
-0.65
1.96
-0.13
0.03
-1.07
-0.58
1.14
-1.51
0.41
1.43
1.86
0.22
-2.31
1.50
1.45
-2.07
0.46
-1.67
-1.23
1.46
0.88
-0.20
-0.59
-0.90
-0.98
-0.14
0.39
-0.79
-1.73
-0.49
-0.74
-0.74
0.43
-0.63
-2.26
0.96
-0.40
0.03
-1.20
0.02
-1.97
0.45
0.28
-1.59
0.46
-0.29
1.04
0.94
0.08
-0.96
-0.73
1.89
-0.97
0.14
0.87
0.12
-0.29
-0.11
1.12
-0.66
1.31
-0.96
0.36
-1.50
-0.01
1.30
-0.57
-1.12
0.64
0.05
0.46
1.30
0.71
-0.53
0.20
-1.20
-0.85
-1.90
-1.69
1.21
-0.72
0.16
1.29
0.74
0.49
-0.13
0.58
-1.04
-0.34
-2.09
0.39
-0.38
-0.89
-1.55
-0.10
0.04
-0.27
-0.45
-0.56
-0.94
-2.54
0.63
-1.17
1.26
-0.05
-0.28
-0.61
1.39
0.01
0.96
-1.82
0.45
0.84
-1.17
-1.06
-0.82
0.23
-0.91
-0.28
-1.21
-0.08
1.00
-0.18
0.17
0.41
0.66
-0.54
-0.20
0.38
0.91
0.31
1.03
1.01
0.08
-1.34
0.81
-0.74
1.30
-2.55
1.47
-0.10
-0.98
0.36
-1.37
0.99
-0.31
0.20
0.65
-1.31
0.66
0.99
1.32
-0.03
-0.54
-0.23
-1.47
2.11
0.14
0.80
0.82
-0.79
2.01
2.05
0.91
1.67
-0.09
1.22
0.32
-0.48
0.71
-0.33
-0.30
-0.09
0.65
0.89
1.72
-0.55
0.01
-1.42
0.71
2.03
0.06
-0.50
0.27
0.19
-1.20
0.38
0.13
-0.20
1.10
-0.77
0.65
-0.04
-0.50
-0.75
-0.14
0.86
1.02
0.05
-0.01
-0.90
1.69
-0.74
0.42
0.57
-0.74
-1.69
-0.44
0.16
-0.39
-0.39
-0.76
-0.46
-0.58
-1.37
-0.53
-1.49
-0.10
0.24
1.03
0.65
0.48
-1.83
1.90
1.60
0.70
-0.40
1.11
-1.27
-0.38
-0.82
1.91
0.00
-0.63
-0.64
1.22
-0.25
-1.03
1.17
-0.33
0.97
-0.17
0.12
-0.11
0.90
0.66
0.29
-0.45
-1.45
0.22
2.01
0.48
-0.42
0.52
-0.68
-1.24
0.48
0.02
0.47
-0.83
1.04
-1.03
1.75
0.44
0.17
-0.02
0.28
-0.66
-2.03
-1.51
-0.31
0.35
-1.97
2.81
-0.53
1.87
0.13
-0.74
-0.84
-1.87
-0.40
-1.16
-1.07
1.10
2.61
-0.62
-0.37
-0.59
0.23
-0.39
0.65
1.70
0.38
-0.38
0.02
0.03
2.03
-0.55
2.13
-0.56
-0.32
-0.81
-1.55
1.45
-0.31
-0.54
0.30
0.97
-0.02
-0.20
-0.40
-1.54
1.40
1.11
-0.76
0.81
-1.21
-0.57
-1.76
0.79
0.69
1.31
1.18
0.63
-0.89
-0.11
1.19
-0.27
-0.43
-0.82
-1.12
0.57
-1.10
0.16
0.22
-1.40
-0.23
-0.79
0.90
-0.13
-1.94
0.78
-0.94
-0.54
0.98
1.64
0.87
0.36
1.24
1.23
-1.34
-0.87
1.56
-1.54
-1.11
0.60
0.12
-0.77
-0.89
0.07
-0.90
-1.43
-0.09
1.21
-0.19
-0.87
-1.03
0.28
0.62
0.40
0.49
1.01
-0.90
0.00
-0.56
0.34
1.18
0.38
-0.53
-1.38
1.30
1.50
-2.33
-0.53
0.18
1.33
-0.47
-1.28
1.56
-0.90
2.50
0.40
-1.10
0.31
0.66
1.13
-0.12
-0.31
0.79
0.17
-0.01
-2.36
-0.80
1.80
-0.01
0.06
3.01
1.15
0.73
-2.63
1.94
0.71
1.77
0.31
-1.23
0.09
1.70
1.08
-0.08
-0.43
-2.02
-1.18
-1.26
-0.36
0.82
0.82
0.46
0.92
0.07
0.36
0.45
0.05
0.34
-0.13
0.11
-0.97
Check correlations
Correlations with grade
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
-0.1
-0.2
-0.3
-0.4
Pick candidate foods
Correlations with grade
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
-0.1
-0.2
-0.3
-0.4
Fit linear function of 3 selected foods
True vs Predicted Grade
SUMMARY OUTPUT
1.5
Regression Statistics
Multiple R
0.4453533
R Square
0.1983396
Adjusted R Square 0.1732877
Standard Error
1.0041891
Observations
100
1
0.5
0
-4
-3
-2
0
1
2
3
4
-1.5
df
Intercept
Mushroom
Pumpkin
Nutella
-0.5
-1
ANOVA
Regression
Residual
Total
-1
3
96
99
SS
MS
F
23.95086544 7.983622 7.917151
96.80600126 1.008396
120.7568667
Coefficient
s
Standard Error
-0.044248 0.100545016
-0.296074
0.10193011
0.255769 0.108443069
0.2671363 0.095186165
t Stat
-0.44008
-2.90468
2.358555
2.806462
P-value
0.660868
0.004563
0.020373
0.006066
Significance F
8.98706E-05
≈ 0.0001 < 0.05
True 𝑃:
𝑥𝑖 , 𝑦 ∼ 𝑁(0,1)
Uncorrelated
Freedman’s Paradox:
“Such practices can distort the signiﬁcance levels of
conventional statistical tests. The existence of this effect
is well known, but its magnitude may come as a surprise, even
to a hardened statistician.”
(1983)
Statistical inference
“Fresh” data
Data
Data
Data
Procedure
Hypothesis tests
Regression
Learning
Result and
statistical
guarantees
p-values
confidence intervals
prediction intervals
Data analysis is adaptive
•
•
•
•
Exploratory data analysis
Variable selection
Hyper-parameter tuning
Shared data - findings
inform others
Result
Result
Data
Data
Data
Result
Result
Is this a real problem?
“Why Most Published Research
Findings Are False” [Ioannidis 05]
1,000,000+ downloads; 1400+ citations
“Irreproducible preclinical research exceeds 50%,
resulting in approximately US$28B/year loss”
[Freedman,Cockburn,Simcoe 15]
Adaptive data analysis is one of the causes
In the course of collecting and analyzing data, researchers have many
decisions to make […] It is rare, and sometimes impractical, for researchers
to make all these decisions beforehand. Rather, it is common (and accepted
practice) for researchers to explore various analytic alternatives, to search
for a combination that yields “statistical significance”, and to then report
only what “worked”.
[Simmons,Nelson,Simonsohn 11]
Evaluating adaptive queries
𝜙1
𝑣1
𝜙2
𝑣2
𝜙𝑚
𝑣𝑚
𝑆
Statistical query 𝑛
𝑆 = 𝑥1oracle
, … , 𝑥𝑛 ∼ 𝑃
[Kearns 93]
Data analyst(s)
𝜙𝑖 : 𝑋 → 0,1
𝑣𝑖 − 𝐄𝑥∼𝑃 𝜙𝑖 𝑥 ≤ τ
with prob. 1 − 𝛽
Can measure correlations, moments, accuracy/error,
parameters and run any SQ-based algorithm!
Answering non-adaptive SQs
Given 𝑚 non-adaptive query functions 𝜙1 , … 𝜙𝑚 and 𝑛
i.i.d. samples from 𝑃 estimate 𝐄𝑥∼𝑃 𝜙𝑖 𝑥
Use empirical mean: 𝐄𝑥∼𝑆 𝜙𝑖 𝑥
=
1
𝑛
log (𝑚/𝛽)
𝑛=𝑂
𝜏2
𝑥∈𝑆
𝜙𝑖 𝑥
Answering adaptive SQs
Estimate expectations of 𝑚 adaptively chosen functions?
What if use 𝐄𝑆 𝜙𝑖 ?
For some constant 𝛽 > 0,𝜏 > 0:
𝑛≥𝑚
e.g. Freedman’s paradox
Safe:
𝑛=𝑂
𝑚⋅log (𝑚/𝛽)
𝜏2
Our results
Exists an algorithm that can answer 𝑡 adaptively chosen SQs
with accuracy 𝜏 for
log 3/2 𝑚 ⋅ log 𝑋
log 𝑚 log 𝑚 ⋅ log 𝑋
𝑛1 = 𝑂
=𝑂
⋅
𝜏 3.5
𝜏2
𝜏 1.5
In time 𝑂(𝑛1 𝑋 )
Exists an algorithm that can answer 𝑡 adaptively chosen SQs
with accuracy 𝜏 for
𝑚 ⋅ log 3/2 𝑚
𝑛2 = 𝑂
𝜏 2.5
In time 𝑂(𝑛2 log 𝑋 )
Previously:
𝑚⋅ log 𝑚
𝑂
2
𝜏
Tool: differential privacy
Differential Privacy
S
Cynthia
Frank
[Dwork,McSherry,Nissim,Smith 06]
Aaron
Chris
Kobbi
Adam
Algorithm
ratio bounded
Pr[𝑟]
𝐴
(Randomized) algorithm 𝐴 is (𝜖, 𝛿)-differentially private
if for any two data sets 𝑆, 𝑆′ such that Δ 𝑆, 𝑆 ′ = 1:
∀𝑍 ⊆ range 𝐴 ,
Pr 𝐴 𝑆 ∈ 𝑍 ≤ 𝑒 𝜖 ⋅ Pr 𝐴 𝑆 ′ ∈ 𝑍 + 𝛿
𝐴
𝐴
Why DP?
DP composes
adaptively
𝑋𝑛
A
𝑌
𝜖1 , 𝛿1
𝑌
𝑋𝑛
B
𝜖2 , 𝛿2
𝑌′
Why DP?
DP composes
adaptively
𝑋𝑛
A
B
𝑌′
𝜖1 + 𝜖2 , 𝛿1 + 𝛿2
Why DP?
DP composes
adaptively
DP implies
generalization
Composition of 𝑚 𝜖-DP algorithms: for
every 𝛿 > 0, is 𝜖 2𝑚 log 1/𝛿 , 𝛿 -DP
[Dwork,Rothblum,Vadhan 10]
If a 𝜏-DP algorithm 𝐴 outputs
a function 𝜙:𝑋 → 0,1 then
2
Pr 𝑛 𝐄𝑆 𝐴 𝑆 − 𝐄𝑃 𝐴 𝑆 ≥ 𝜏 ≤ 6𝑒 −𝜏 𝑛
𝐴,𝑆∼𝑃
Similarly, for 𝜏, 𝑒
Folklore:
𝐄
𝐴,𝑆∼𝑃𝑛
1
𝜏
−
𝐄𝑆 𝐴 𝑆
-DP algorithms
− 𝐄𝑃 𝐴 𝑆
≤𝜏
Back to queries
If a DP algorithm 𝐴 outputs a function then
𝐄𝑆 𝐴(𝑆) ≈ 𝐄𝑃 [𝐴(𝑆)] with high prob.
1. How to make the analyst differentially private?
Answer queries with DP! Post-processing preserves DP
2. How to answer queries for 𝐄𝑆 𝐴(𝑆) with DP?
Counting queries
Laplace noise mechanism [DMNS 06, DRV 10]
• 𝑛2 = 𝑂
𝑚⋅log3/2 𝑚
𝜏2.5
Multiplicative weights based mechanism [Hardt,Rothblum 10]
• 𝑛1 = 𝑂
log3/2 𝑚⋅ log 𝑋
𝜏3.5
Further developments
• 𝑛 = Ω 𝑚/𝜏 for poly-time algorithms assuming OWF exists
[Hardt,Ullman 14; Steinke,Ullman 14]
• 𝑛 = O 𝑚 ⋅ log 𝑚 /𝜏 2 and more general low-sensitivity
queries [Bassily,Smith,Steinke,Ullman 15]
• Stronger (tight) generalization for (𝜖, 𝛿)-DP algorithms
[Nissim,Stemmer 15]
• General adaptive setting and other techniques: description
length and max-information [DFHPRR 15]
• Application: algorithms for reusing holdout set in ML
[DFHPRR 15]
• Application to maintaining accurate leaderboard in ML
competitions [Blum,Hardt 15]
Ask authors! Watch TCS+ talk by Aaron

Open as PDF

Similar pages: slides; PB #1402051: Product Bulletin Si7005 Datasheet v1.2; PB #1408182: Si7005-15 MSL2 Rating General; Expre35s35s, [ S]-Protein Labeling Mix; EasyTag Express Protein Labeling Mix, [ S]-; EasyTag Express Protein Labeling Mix, [ S]-; MichaelHay.pdf; slides