PDF

Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Scalable Bayesian Network Classifiers
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Geoff Webb
Ana Martinez
Nayyar Zaidi
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Introduction
• Learning from large data
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
• Learning from large data is not just about scaling-up
existing algorithms.
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
• Learning from large data is not just about scaling-up
existing algorithms.
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
• Large data is best tackled by fundamentally new learning
algorithms.
Scalable
Bayesian
Network
Classifiers
Learning curves
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
• Error typically reduces as data quantity increases.
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Learning curves
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
• Error typically reduces as data quantity increases.
• Different algorithms have different curves.
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Learning curves
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
Overfitting on small data
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
• Algorithms that closely fit complex multivariate
distributions will tend to overfit small data
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Learning curves
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
Overfitting on small data
0.4
0.3
Best on large data
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
900,000
1,000,000
• Algorithms that closely fit complex multivariate
distributions will tend to overfit small data, but can better
fit large data
Scalable
Bayesian
Network
Classifiers
Learning curves
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
Overfitting on small data
0.4
0.3
Best on large data
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
900,000
1,000,000
• Algorithms that closely fit complex multivariate
distributions will tend to overfit small data, but can better
fit large data: Bias-Variance Tradeoff
Scalable
Bayesian
Network
Classifiers
Capacity to fit = model space + optimization
Geoff Webb
Ana Martinez
Nayyar Zaidi
1
Naïve Bayes
Logistic Regression
0.9
Introduction
0.8
Bayesian
Network
Classifiers
0.7
Selective KDB
0.6
Conclusions &
Future
Research
RMSE
Experiments
0.5
0.4
0.3
0.2
0.1
0
Training set size
Scalable
Bayesian
Network
Classifiers
Much research of questionable relevance
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
• Most prior research has used relatively small data.
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Requirements
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
900,000
1,000,000
• Need algorithms that can closely fit complex multivariate
distributions
Scalable
Bayesian
Network
Classifiers
Requirements
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
900,000
1,000,000
• Need algorithms that can closely fit complex multivariate
distributions, while being very computationally efficient.
Scalable
Bayesian
Network
Classifiers
Requirements
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
900,000
1,000,000
• Need algorithms that can closely fit complex multivariate
distributions, while being very computationally efficient.
• low bias
Scalable
Bayesian
Network
Classifiers
Requirements
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
900,000
1,000,000
• Need algorithms that can closely fit complex multivariate
distributions, while being very computationally efficient.
• low bias
• few passes through data
Scalable
Bayesian
Network
Classifiers
Requirements
0.8
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.7
0.6
Bayesian
Network
Classifiers
0.5
Selective KDB
Experiments
Conclusions &
Future
Research
RMSE
Introduction
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000
500,000
600,000
Training set size
700,000
800,000
900,000
1,000,000
• Need algorithms that can closely fit complex multivariate
distributions, while being very computationally efficient.
• low bias
• few passes through data
• out of core
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
State-of-the-art
• Most low bias algorithms do not scale
• Random Forest
• SVM
• Deep Learning
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
State-of-the-art
• Most low bias algorithms do not scale
• Random Forest
• SVM
• Deep Learning
and inherently in-core
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
State-of-the-art
• Most low bias algorithms do not scale
• Random Forest
• SVM
• Deep Learning
and inherently in-core
• Selective KDB is a scalable low-bias Bayesian Network
Classifier
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Outline
1 Introduction
Introduction
Bayesian
Network
Classifiers
2 Bayesian Network Classifiers
Selective KDB
3 Selective KDB
Experiments
Conclusions &
Future
Research
4 Experiments
5 Conclusions & Future Research
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Bayesian Network Classifiers
• Defined by parent relation π and Conditional Probability
Tables (CPTs)
• π encodes conditional independence / structure
• CPTs encode conditional probabilities
• Classifies using P(y | x) ∝ P(y | πY )
• Usually makes Y a parent of all Xi
Conclusions &
Future
Research
Q
P(xi | πi )
Y
X1
X2
X3
X4
X5
• Given π, CPTs can be learned by counting joint
frequencies
• single pass
• incremental
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
k-Dependence Bayes (KDB)
• Two pass learning
• 1st pass, learn structure:
• Collect counts for pairs of attributes with the class.
• Order attributes based on MI with the class.
• Select parents based on CMI.
• no more than k parents
• parents must be earlier in the order
Conclusions &
Future
Research
Y
X1
X2
X3
X4
X5
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
k-Dependence Bayes (KDB)
• Two pass learning
• 1st pass, learn structure:
• Collect counts for pairs of attributes with the class.
• Order attributes based on MI with the class.
• Select parents based on CMI.
• no more than k parents
• parents must be earlier in the order
Conclusions &
Future
Research
Y
X1
X2
X3
X4
X5
• 2nd pass, learn CPTs:
• Collect statistics according to the structure learned.
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
k-Dependence Bayes (KDB)
• Training Space: O ya2 v 2 + yav k+1
• Classification Space: O yav k+1
Conclusions &
Future
Research
• Training Time: O ta2 + ya2 v 2 + tak
• Classification Time: O(yak)
Selective KDB
Experiments
t = no. of training examples
a = number of attributes;
v = average number of values;
y = number of classes
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
k-Dependence Bayes (KDB)
• Parameter k controls bias-variance tradeoff
Naïve Bayes
0.8
KDB k=1
Introduction
KDB k=2
Bayesian
Network
Classifiers
0.7
KDB k=3
KDB k=4
KDB k=5
0.6
Selective KDB
0.5
Conclusions &
Future
Research
RMSE
Experiments
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000 500,000 600,000
Training set size
700,000
800,000
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
k-Dependence Bayes (KDB)
• Parameter k controls bias-variance tradeoff
Naïve Bayes
0.8
KDB k=1
Introduction
KDB k=2
Bayesian
Network
Classifiers
0.7
KDB k=3
KDB k=4
KDB k=5
0.6
Selective KDB
0.5
Conclusions &
Future
Research
RMSE
Experiments
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000 500,000 600,000
Training set size
700,000
• No a priori means to anticipate best k
800,000
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
k-Dependence Bayes (KDB)
• Parameter k controls bias-variance tradeoff
Naïve Bayes
0.8
KDB k=1
Introduction
KDB k=2
Bayesian
Network
Classifiers
0.7
KDB k=3
KDB k=4
KDB k=5
0.6
Selective KDB
0.5
Conclusions &
Future
Research
RMSE
Experiments
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000 500,000 600,000
Training set size
700,000
• No a priori means to anticipate best k
• Spurious attributes may increase error
800,000
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
KDB models are nested
• Mi,j = KDB with k = i using attributes 1 to j.
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
k
1
2
3
4
Attributes (ordered
1
2
3
M1,1 M1,2 M1,3
M2,1 M2,2 M2,3
M3,1 M3,2 M3,3
M4,1 M4,2 M4,3
by MI)
4
M1,4
M2,4
M3,4
M4,4
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
KDB models are nested
• Mi,j = KDB with k = i using attributes 1 to j.
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
k
1
2
3
4
Attributes (ordered
1
2
3
M1,1 M1,2 M1,3
M2,1 M2,2 M2,3
M3,1 M3,2 M3,3
M4,1 M4,2 M4,3
by MI)
4
M1,4
M2,4
M3,4
M4,4
• Mi, j is a minor extension of Mi, j−1 and Mi−1, j
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
KDB models are nested
• Mi,j = KDB with k = i using attributes 1 to j.
Introduction
k
1
2
3
4
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Attributes (ordered
1
2
3
M1,1 M1,2 M1,3
M2,1 M2,2 M2,3
M3,1 M3,2 M3,3
M4,1 M4,2 M4,3
by MI)
4
M1,4
M2,4
M3,4
M4,4
• Mi, j is a minor extension of Mi, j−1 and Mi−1, j
Y
X1
X2
X3
X4
X5
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
KDB models are nested
• Mi,j = KDB with k = i using attributes 1 to j.
Introduction
k
1
2
3
4
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Attributes (ordered
1
2
3
M1,1 M1,2 M1,3
M2,1 M2,2 M2,3
M3,1 M3,2 M3,3
M4,1 M4,2 M4,3
by MI)
4
M1,4
M2,4
M3,4
M4,4
• Mi, j is a minor extension of Mi, j−1 and Mi−1, j
Y
X1
X2
X3
X4
X5
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
KDB models are nested
• Mi,j = KDB with k = i using attributes 1 to j.
Introduction
k
1
2
3
4
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Attributes (ordered
1
2
3
M1,1 M1,2 M1,3
M2,1 M2,2 M2,3
M3,1 M3,2 M3,3
M4,1 M4,2 M4,3
by MI)
4
M1,4
M2,4
M3,4
M4,4
• Mi, j is a minor extension of Mi, j−1 and Mi−1, j
Y
X1
X2
X3
X4
X5
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Selective KDB
• In one extra pass select an attribute subset and best k
using leave-one-out CV.
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Selective KDB
• In one extra pass select an attribute subset and best k
using leave-one-out CV.
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
• The full model subsumes all k × a submodels.
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Selective KDB
• In one extra pass select an attribute subset and best k
using leave-one-out CV.
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
• The full model subsumes all k × a submodels.
• Very efficient selection between a large class of strong
models.
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Leave-one-out CV
• Very low bias estimator of out-of-sample performance
• Incremental cross-validation makes it VERY efficient for
BNC
• Collect counts from all data once
• When classifying a hold-out object subtract it from the
counts
• All k × a nested models can be evaluated with little more
computation than the full KDB model
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
• Training Space: O ya2 v 2 + yav k+1
∗
• Classification Space: O y a∗ v k +1
• Training Time: O ta2 + ya2 v 2 + tay k
• Classification Time: O(y a∗ k ∗ )
Selective KDB
Experiments
Conclusions &
Future
Research
a = number of attributes;
v = average number of values;
y = number of classes
a∗ = number of attributes selected (a∗ ≤ a).
k ∗ = best value of k found (k ∗ ≤ kmax ).
Scalable
Bayesian
Network
Classifiers
Selective KDB
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.8
Introduction
0.7
Bayesian
Network
Classifiers
0.6
Selective KDB
0.5
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
Conclusions &
Future
Research
RMSE
Experiments
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000 500,000 600,000
Training set size
700,000
800,000
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Selective KDB
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.8
Introduction
0.7
Bayesian
Network
Classifiers
0.6
Selective KDB
0.5
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
SKDB Only K
Conclusions &
Future
Research
RMSE
Experiments
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000 500,000 600,000
Training set size
700,000
800,000
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Selective KDB
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.8
Introduction
0.7
Bayesian
Network
Classifiers
0.6
Selective KDB
0.5
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
SKDB Only K
SKDB k=5 Only Atts
Conclusions &
Future
Research
RMSE
Experiments
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000 500,000 600,000
Training set size
700,000
800,000
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
Selective KDB
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.8
Introduction
0.7
Bayesian
Network
Classifiers
0.6
Selective KDB
0.5
Naïve Bayes
KDB k=1
KDB k=2
KDB k=3
KDB k=4
KDB k=5
SKDB Only K
SKDB k=5 Only Atts
SKDB
Conclusions &
Future
Research
RMSE
Experiments
0.4
0.3
0.2
0.1
0
0
100,000
200,000
300,000
400,000 500,000 600,000
Training set size
700,000
800,000
900,000
1,000,000
Scalable
Bayesian
Network
Classifiers
16 datasets (165K-54M examples)
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
s
Name
localization
census-income
USPSExtended
MITFaceSetA
MITFaceSetB
MSDYear-Prediction
covtype
MITFaceSetC
poker-hand
uscensus1990
PAMAP2
kddcup
linkage
mnist8ms
satellite
splice
sparse format.
No. of cases
(million)
0.165
0.299
0.342
0.474
0.489
0.515
0.581
0.839
1.025
2.458
3.851
5.210
5.749
8.100
8.705
54.628
No. of
Atts.
5
41
676
361
361
90
54
361
10
67
54
41
11
784
138
141
No. of
Classes
11
2
2
2
2
90
7
2
10
4
19
40
2
10
24
2
Size
11MB
136MB
603MBs
584MB
603MB
601MBs
72MB
1.1GB
24MB
325MB
1.7GB
754MB
251MB
19GBs
3.6GB
7.3GB
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Numeric attributes
• 5 bin equal frequency discretisation
Scalable
Bayesian
Network
Classifiers
Out-of-core BNCs
Geoff Webb
Ana Martinez
Nayyar Zaidi
RMSE
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
KDB5 vs SKDB
best KDB vs SKDB
Scalable
Bayesian
Network
Classifiers
Out-of-core BNCs
Geoff Webb
Ana Martinez
Nayyar Zaidi
• Training Time Comparisons
1000
Introduction
Selective KDB
Experiments
100
Time (seconds)
Bayesian
Network
Classifiers
NB
10
TAN
AODE
1
KDB k=5
SKDB
Datasets
kd
ddcup
MITFaceeSetC
UPSSexteended
MITFaceeSetB
MITFaceeSetA
usccensus
co
ovtype
don
nation
Pokerr-hand
Census-in
ncome
0,1
localizzation
Conclusions &
Future
Research
Scalable
Bayesian
Network
Classifiers
Out-of-core BNCs
Geoff Webb
Ana Martinez
Nayyar Zaidi
• Classification Time Comparisons
1000
Introduction
NB
TAN
k=5
k=4.8
1
AODE
KDB k=5
k=4
k
4
k=4
k
4
SKDB
k=2
k
2
Datasets
kddcup
MIT
TFaceSetC
MIT
TFaceSetB
UPSS
Sextended
MIT
TFaceSetA
donation
us-income
Censu
Poker-hand
0,1
lo
ocalization
Conclusions &
Future
Research
k=4.5
k=5
k=3
k=4
k=4
uscensus
Experiments
k=3
10
covtype
Selective KDB
100
Tim
me (seconds)
Bayesian
Network
Classifiers
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Out-of-core SGD
• SGD in Vowpal Wabbit (VW):
• Squared and logistic function.
• Quadratic features (best results).
• Different number of passes (3 or 10).
• Discrete attributes into binary features.
• One-against-all for multiclass classification.
Scalable
Bayesian
Network
Classifiers
Out-of-core SGD
Bayesian
Network
Classifiers
Selective KDB
Experiments
1
0.1
0.1
SKDB (max k=5) - RMSE (log. scale)
Introduction
SKDB (max k=5) - RMSE (log. scale)
Geoff Webb
Ana Martinez
Nayyar Zaidi
1
0.01
0.001
0.01
0.001
Conclusions &
Future
Research
0.0001
0.0001
0.001
0.01
0.1
VWLF - RMSE (log. scale)
0.0001
0.0001
0.001
0.01
0.1
VWSF - RMSE (log. scale)
VW (logistic) - RMSE
Selective KDB
1
VW (squared) - 0-1 Loss
VW
RMSE (logistic) 0-1 Loss (squared)
(7+2)-0-7
8-1-7
1
Scalable
Bayesian
Network
Classifiers
Out-of-core SGD
Geoff Webb
Ana Martinez
Nayyar Zaidi
• Training Time Comparisons
10000
Introduction
VWSF
10
VWLF
SKDB
Datasets
kddccup
MITFaceSetC
UPSSextendded
MITFaceSetB
MITFaceSeetA
uscennsus
covtyype
donattion
1
Poker-haand
Conclusions &
Future
Research
100
Census-incoome
Experiments
localizattion
Selective KDB
1000
Tíme (seconds)
Bayesian
Network
Classifiers
Scalable
Bayesian
Network
Classifiers
Out-of-core SGD
Geoff Webb
Ana Martinez
Nayyar Zaidi
• Classification Time Comparisons
100
1
VWSF
VWLF
SKDB
Datasets
kddccup
MITFaceSetC
MITFaceSetB
UPSSextend
ded
MITFaceSeetA
uscen
nsus
covty
ype
0.1
donattion
Conclusions &
Future
Research
Census-inco
ome
Experiments
10
Poker-haand
Selective KDB
localizattion
Bayesian
Network
Classifiers
Time (seconds)
Introduction
Scalable
Bayesian
Network
Classifiers
In-core BayesNet and RF
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
BayesNet
RF
x = sampled dataset
k-selective KDB
BayesNet
6-3-3
RF (Num)
5-0-6
Scalable
Bayesian
Network
Classifiers
In-core BayesNet and RF
Geoff Webb
Ana Martinez
Nayyar Zaidi
• Training Time Comparisons
100000
Introduction
RF
SKDB
10
1
Datasets
USPSExtennded
SetB
MITFaceS
MITFaceSetA
covttype
0.1
donattion
Conclusions &
Future
Research
BayesNet
100
Census-incoome
Experiments
1000
localizattion
Selective KDB
10000
Time (sseconds)
Bayesian
Network
Classifiers
Scalable
Bayesian
Network
Classifiers
In-core BayesNet and RF
Geoff Webb
Ana Martinez
Nayyar Zaidi
• Classification Time Comparisons
100
RF
SKDB
1
Datasets
USPSExten
nded
MITFaceS
SetB
MITFaceSetA
0.1
covttype
Conclusions &
Future
Research
BayesNet
donattion
Experiments
10
Census-inco
ome
Selective KDB
localizattion
Bayesian
Network
Classifiers
Time (sseconds)
Introduction
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Global comparisons
• Cumulative Ranking
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
• Selective KDB copes well with high-dimensional datasets
and datasets with more than 1 million points.
• RF performs better on datasets with small # of attributes.
• VW has advantage for sparse numeric data.
Scalable
Bayesian
Network
Classifiers
Global comparisons
Geoff Webb
Ana Martinez
Nayyar Zaidi
• Ranking
8
7.63
Introduction
7
Bayesian
Network
Classifiers
Experiments
Conclusions &
Future
Research
5.88
6
Rankking
Selective KDB
6.81
5
4
3.47
3.44
3.34
2.91
3
2.53
2
1
NB
AODE
TAN
BayesNet
y
Learners
VWLF
KDB ((best k))
RF ((Num))
SKDB
Scalable
Bayesian
Network
Classifiers
Global comparisons (Time)
1000
Geoff Webb
Ana Martinez
Nayyar Zaidi
500
200
Time (seconds) − log. scale
100
10000
50
Conclusions &
Future
Research
Time (seconds)
Experiments
5000
Selective KDB
0
Bayesian
Network
Classifiers
15000
Introduction
NB
TAN
AODE
KDB
ksKDB
Learner
Training
VWsq
VWlog
NB
TAN
AODE
KDB
Learner
Test
ksKDB
VWsq
VWlog
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Stepping back
• The key trick is nested evaluation of a large class of
count-based models
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
• Works also with 2-pass selective evaluation of AnDE
models
• Combines the efficiency of simple generative techniques
with the power of discriminative learning
• Can use any loss function that is a function of each
P̂(y | x)
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Future Research
• Numeric attributes
• it seems that there should be something better than
discretisation.
• this remains elusive ...
• Overfitting avoidance
• Increase number of alternative models considered
• Explore other forms of nested BNCs
• Two pass Selective KDB
• sample small test set in second pass
• Single pass generative/discriminative learning
• initially learn a simple generative model
• collect discriminative statistics and refine the model when
there is sufficient evidence to make a choice
• refinements might be attribute selection, structural
refinement or weighting
• repeat
Scalable
Bayesian
Network
Classifiers
Satellite learning curves
1
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.9
Introduction
0.8
Bayesian
Network
Classifiers
0.7
0.6
Conclusions &
Future
Research
RMSE
Selective KDB
Experiments
NB
KDB K=1
KDB K=2
KDB K=3
KDB K=4
KDB K=5
KDB K=6
KDB K=7
KDB K=8
0.5
0.4
0.3
0.2
0.1
0
Training set size
Scalable
Bayesian
Network
Classifiers
Satellite learning curves
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.9
Introduction
0.8
Bayesian
Network
Classifiers
0.7
0.6
Conclusions &
Future
Research
RMSE
Selective KDB
Experiments
NB
KDB K=1
KDB K=2
KDB K=3
KDB K=4
KDB K=5
KDB K=6
KDB K=7
KDB K=8
SKDB K=10
1
0.5
0.4
0.3
0.2
0.1
0
Training set size
Scalable
Bayesian
Network
Classifiers
Satellite learning curves
Geoff Webb
Ana Martinez
Nayyar Zaidi
0.9
Introduction
0.8
Bayesian
Network
Classifiers
0.7
0.6
Conclusions &
Future
Research
RMSE
Selective KDB
Experiments
NB
KDB K=1
KDB K=2
KDB K=3
KDB K=4
KDB K=5
KDB K=6
KDB K=7
KDB K=8
SKDB K=10
Random Forest
1
0.5
0.4
0.3
0.2
0.1
0
Training set size
Scalable
Bayesian
Network
Classifiers
Geoff Webb
Ana Martinez
Nayyar Zaidi
Introduction
Bayesian
Network
Classifiers
Selective KDB
Experiments
Conclusions &
Future
Research
Conclusions
• Large data calls for fundamentally new learning algorithms
• We are pioneering a new generation of theoretically
well-founded algorithms that are both scalable to very
large data and capable of exploiting the fine detail
inherent therein