Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Scalable Bayesian Network Classifiers Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Geoff Webb Ana Martinez Nayyar Zaidi Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Introduction • Learning from large data Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction • Learning from large data is not just about scaling-up existing algorithms. Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction • Learning from large data is not just about scaling-up existing algorithms. Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research • Large data is best tackled by fundamentally new learning algorithms. Scalable Bayesian Network Classifiers Learning curves 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 • Error typically reduces as data quantity increases. 900,000 1,000,000 Scalable Bayesian Network Classifiers Learning curves 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 • Error typically reduces as data quantity increases. • Different algorithms have different curves. 900,000 1,000,000 Scalable Bayesian Network Classifiers Learning curves 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction Overfitting on small data 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 • Algorithms that closely fit complex multivariate distributions will tend to overfit small data 900,000 1,000,000 Scalable Bayesian Network Classifiers Learning curves 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction Overfitting on small data 0.4 0.3 Best on large data 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 • Algorithms that closely fit complex multivariate distributions will tend to overfit small data, but can better fit large data Scalable Bayesian Network Classifiers Learning curves 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction Overfitting on small data 0.4 0.3 Best on large data 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 • Algorithms that closely fit complex multivariate distributions will tend to overfit small data, but can better fit large data: Bias-Variance Tradeoff Scalable Bayesian Network Classifiers Capacity to fit = model space + optimization Geoff Webb Ana Martinez Nayyar Zaidi 1 Naïve Bayes Logistic Regression 0.9 Introduction 0.8 Bayesian Network Classifiers 0.7 Selective KDB 0.6 Conclusions & Future Research RMSE Experiments 0.5 0.4 0.3 0.2 0.1 0 Training set size Scalable Bayesian Network Classifiers Much research of questionable relevance 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 • Most prior research has used relatively small data. 900,000 1,000,000 Scalable Bayesian Network Classifiers Requirements 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 • Need algorithms that can closely fit complex multivariate distributions Scalable Bayesian Network Classifiers Requirements 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 • Need algorithms that can closely fit complex multivariate distributions, while being very computationally efficient. Scalable Bayesian Network Classifiers Requirements 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 • Need algorithms that can closely fit complex multivariate distributions, while being very computationally efficient. • low bias Scalable Bayesian Network Classifiers Requirements 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 • Need algorithms that can closely fit complex multivariate distributions, while being very computationally efficient. • low bias • few passes through data Scalable Bayesian Network Classifiers Requirements 0.8 Geoff Webb Ana Martinez Nayyar Zaidi 0.7 0.6 Bayesian Network Classifiers 0.5 Selective KDB Experiments Conclusions & Future Research RMSE Introduction 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 • Need algorithms that can closely fit complex multivariate distributions, while being very computationally efficient. • low bias • few passes through data • out of core Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research State-of-the-art • Most low bias algorithms do not scale • Random Forest • SVM • Deep Learning Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research State-of-the-art • Most low bias algorithms do not scale • Random Forest • SVM • Deep Learning and inherently in-core Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research State-of-the-art • Most low bias algorithms do not scale • Random Forest • SVM • Deep Learning and inherently in-core • Selective KDB is a scalable low-bias Bayesian Network Classifier Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Outline 1 Introduction Introduction Bayesian Network Classifiers 2 Bayesian Network Classifiers Selective KDB 3 Selective KDB Experiments Conclusions & Future Research 4 Experiments 5 Conclusions & Future Research Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Bayesian Network Classifiers • Defined by parent relation π and Conditional Probability Tables (CPTs) • π encodes conditional independence / structure • CPTs encode conditional probabilities • Classifies using P(y | x) ∝ P(y | πY ) • Usually makes Y a parent of all Xi Conclusions & Future Research Q P(xi | πi ) Y X1 X2 X3 X4 X5 • Given π, CPTs can be learned by counting joint frequencies • single pass • incremental Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments k-Dependence Bayes (KDB) • Two pass learning • 1st pass, learn structure: • Collect counts for pairs of attributes with the class. • Order attributes based on MI with the class. • Select parents based on CMI. • no more than k parents • parents must be earlier in the order Conclusions & Future Research Y X1 X2 X3 X4 X5 Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments k-Dependence Bayes (KDB) • Two pass learning • 1st pass, learn structure: • Collect counts for pairs of attributes with the class. • Order attributes based on MI with the class. • Select parents based on CMI. • no more than k parents • parents must be earlier in the order Conclusions & Future Research Y X1 X2 X3 X4 X5 • 2nd pass, learn CPTs: • Collect statistics according to the structure learned. Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers k-Dependence Bayes (KDB) • Training Space: O ya2 v 2 + yav k+1 • Classification Space: O yav k+1 Conclusions & Future Research • Training Time: O ta2 + ya2 v 2 + tak • Classification Time: O(yak) Selective KDB Experiments t = no. of training examples a = number of attributes; v = average number of values; y = number of classes Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi k-Dependence Bayes (KDB) • Parameter k controls bias-variance tradeoff Naïve Bayes 0.8 KDB k=1 Introduction KDB k=2 Bayesian Network Classifiers 0.7 KDB k=3 KDB k=4 KDB k=5 0.6 Selective KDB 0.5 Conclusions & Future Research RMSE Experiments 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi k-Dependence Bayes (KDB) • Parameter k controls bias-variance tradeoff Naïve Bayes 0.8 KDB k=1 Introduction KDB k=2 Bayesian Network Classifiers 0.7 KDB k=3 KDB k=4 KDB k=5 0.6 Selective KDB 0.5 Conclusions & Future Research RMSE Experiments 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 • No a priori means to anticipate best k 800,000 900,000 1,000,000 Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi k-Dependence Bayes (KDB) • Parameter k controls bias-variance tradeoff Naïve Bayes 0.8 KDB k=1 Introduction KDB k=2 Bayesian Network Classifiers 0.7 KDB k=3 KDB k=4 KDB k=5 0.6 Selective KDB 0.5 Conclusions & Future Research RMSE Experiments 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 • No a priori means to anticipate best k • Spurious attributes may increase error 800,000 900,000 1,000,000 Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi KDB models are nested • Mi,j = KDB with k = i using attributes 1 to j. Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research k 1 2 3 4 Attributes (ordered 1 2 3 M1,1 M1,2 M1,3 M2,1 M2,2 M2,3 M3,1 M3,2 M3,3 M4,1 M4,2 M4,3 by MI) 4 M1,4 M2,4 M3,4 M4,4 Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi KDB models are nested • Mi,j = KDB with k = i using attributes 1 to j. Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research k 1 2 3 4 Attributes (ordered 1 2 3 M1,1 M1,2 M1,3 M2,1 M2,2 M2,3 M3,1 M3,2 M3,3 M4,1 M4,2 M4,3 by MI) 4 M1,4 M2,4 M3,4 M4,4 • Mi, j is a minor extension of Mi, j−1 and Mi−1, j Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi KDB models are nested • Mi,j = KDB with k = i using attributes 1 to j. Introduction k 1 2 3 4 Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Attributes (ordered 1 2 3 M1,1 M1,2 M1,3 M2,1 M2,2 M2,3 M3,1 M3,2 M3,3 M4,1 M4,2 M4,3 by MI) 4 M1,4 M2,4 M3,4 M4,4 • Mi, j is a minor extension of Mi, j−1 and Mi−1, j Y X1 X2 X3 X4 X5 Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi KDB models are nested • Mi,j = KDB with k = i using attributes 1 to j. Introduction k 1 2 3 4 Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Attributes (ordered 1 2 3 M1,1 M1,2 M1,3 M2,1 M2,2 M2,3 M3,1 M3,2 M3,3 M4,1 M4,2 M4,3 by MI) 4 M1,4 M2,4 M3,4 M4,4 • Mi, j is a minor extension of Mi, j−1 and Mi−1, j Y X1 X2 X3 X4 X5 Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi KDB models are nested • Mi,j = KDB with k = i using attributes 1 to j. Introduction k 1 2 3 4 Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Attributes (ordered 1 2 3 M1,1 M1,2 M1,3 M2,1 M2,2 M2,3 M3,1 M3,2 M3,3 M4,1 M4,2 M4,3 by MI) 4 M1,4 M2,4 M3,4 M4,4 • Mi, j is a minor extension of Mi, j−1 and Mi−1, j Y X1 X2 X3 X4 X5 Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Selective KDB • In one extra pass select an attribute subset and best k using leave-one-out CV. Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Selective KDB • In one extra pass select an attribute subset and best k using leave-one-out CV. Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research • The full model subsumes all k × a submodels. Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Selective KDB • In one extra pass select an attribute subset and best k using leave-one-out CV. Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research • The full model subsumes all k × a submodels. • Very efficient selection between a large class of strong models. Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Leave-one-out CV • Very low bias estimator of out-of-sample performance • Incremental cross-validation makes it VERY efficient for BNC • Collect counts from all data once • When classifying a hold-out object subtract it from the counts • All k × a nested models can be evaluated with little more computation than the full KDB model Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB • Training Space: O ya2 v 2 + yav k+1 ∗ • Classification Space: O y a∗ v k +1 • Training Time: O ta2 + ya2 v 2 + tay k • Classification Time: O(y a∗ k ∗ ) Selective KDB Experiments Conclusions & Future Research a = number of attributes; v = average number of values; y = number of classes a∗ = number of attributes selected (a∗ ≤ a). k ∗ = best value of k found (k ∗ ≤ kmax ). Scalable Bayesian Network Classifiers Selective KDB Geoff Webb Ana Martinez Nayyar Zaidi 0.8 Introduction 0.7 Bayesian Network Classifiers 0.6 Selective KDB 0.5 Naïve Bayes KDB k=1 KDB k=2 KDB k=3 KDB k=4 KDB k=5 Conclusions & Future Research RMSE Experiments 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 Scalable Bayesian Network Classifiers Selective KDB Geoff Webb Ana Martinez Nayyar Zaidi 0.8 Introduction 0.7 Bayesian Network Classifiers 0.6 Selective KDB 0.5 Naïve Bayes KDB k=1 KDB k=2 KDB k=3 KDB k=4 KDB k=5 SKDB Only K Conclusions & Future Research RMSE Experiments 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 Scalable Bayesian Network Classifiers Selective KDB Geoff Webb Ana Martinez Nayyar Zaidi 0.8 Introduction 0.7 Bayesian Network Classifiers 0.6 Selective KDB 0.5 Naïve Bayes KDB k=1 KDB k=2 KDB k=3 KDB k=4 KDB k=5 SKDB Only K SKDB k=5 Only Atts Conclusions & Future Research RMSE Experiments 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 Scalable Bayesian Network Classifiers Selective KDB Geoff Webb Ana Martinez Nayyar Zaidi 0.8 Introduction 0.7 Bayesian Network Classifiers 0.6 Selective KDB 0.5 Naïve Bayes KDB k=1 KDB k=2 KDB k=3 KDB k=4 KDB k=5 SKDB Only K SKDB k=5 Only Atts SKDB Conclusions & Future Research RMSE Experiments 0.4 0.3 0.2 0.1 0 0 100,000 200,000 300,000 400,000 500,000 600,000 Training set size 700,000 800,000 900,000 1,000,000 Scalable Bayesian Network Classifiers 16 datasets (165K-54M examples) Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research s Name localization census-income USPSExtended MITFaceSetA MITFaceSetB MSDYear-Prediction covtype MITFaceSetC poker-hand uscensus1990 PAMAP2 kddcup linkage mnist8ms satellite splice sparse format. No. of cases (million) 0.165 0.299 0.342 0.474 0.489 0.515 0.581 0.839 1.025 2.458 3.851 5.210 5.749 8.100 8.705 54.628 No. of Atts. 5 41 676 361 361 90 54 361 10 67 54 41 11 784 138 141 No. of Classes 11 2 2 2 2 90 7 2 10 4 19 40 2 10 24 2 Size 11MB 136MB 603MBs 584MB 603MB 601MBs 72MB 1.1GB 24MB 325MB 1.7GB 754MB 251MB 19GBs 3.6GB 7.3GB Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Numeric attributes • 5 bin equal frequency discretisation Scalable Bayesian Network Classifiers Out-of-core BNCs Geoff Webb Ana Martinez Nayyar Zaidi RMSE Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research KDB5 vs SKDB best KDB vs SKDB Scalable Bayesian Network Classifiers Out-of-core BNCs Geoff Webb Ana Martinez Nayyar Zaidi • Training Time Comparisons 1000 Introduction Selective KDB Experiments 100 Time (seconds) Bayesian Network Classifiers NB 10 TAN AODE 1 KDB k=5 SKDB Datasets kd ddcup MITFaceeSetC UPSSexteended MITFaceeSetB MITFaceeSetA usccensus co ovtype don nation Pokerr-hand Census-in ncome 0,1 localizzation Conclusions & Future Research Scalable Bayesian Network Classifiers Out-of-core BNCs Geoff Webb Ana Martinez Nayyar Zaidi • Classification Time Comparisons 1000 Introduction NB TAN k=5 k=4.8 1 AODE KDB k=5 k=4 k 4 k=4 k 4 SKDB k=2 k 2 Datasets kddcup MIT TFaceSetC MIT TFaceSetB UPSS Sextended MIT TFaceSetA donation us-income Censu Poker-hand 0,1 lo ocalization Conclusions & Future Research k=4.5 k=5 k=3 k=4 k=4 uscensus Experiments k=3 10 covtype Selective KDB 100 Tim me (seconds) Bayesian Network Classifiers Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Out-of-core SGD • SGD in Vowpal Wabbit (VW): • Squared and logistic function. • Quadratic features (best results). • Different number of passes (3 or 10). • Discrete attributes into binary features. • One-against-all for multiclass classification. Scalable Bayesian Network Classifiers Out-of-core SGD Bayesian Network Classifiers Selective KDB Experiments 1 0.1 0.1 SKDB (max k=5) - RMSE (log. scale) Introduction SKDB (max k=5) - RMSE (log. scale) Geoff Webb Ana Martinez Nayyar Zaidi 1 0.01 0.001 0.01 0.001 Conclusions & Future Research 0.0001 0.0001 0.001 0.01 0.1 VWLF - RMSE (log. scale) 0.0001 0.0001 0.001 0.01 0.1 VWSF - RMSE (log. scale) VW (logistic) - RMSE Selective KDB 1 VW (squared) - 0-1 Loss VW RMSE (logistic) 0-1 Loss (squared) (7+2)-0-7 8-1-7 1 Scalable Bayesian Network Classifiers Out-of-core SGD Geoff Webb Ana Martinez Nayyar Zaidi • Training Time Comparisons 10000 Introduction VWSF 10 VWLF SKDB Datasets kddccup MITFaceSetC UPSSextendded MITFaceSetB MITFaceSeetA uscennsus covtyype donattion 1 Poker-haand Conclusions & Future Research 100 Census-incoome Experiments localizattion Selective KDB 1000 Tíme (seconds) Bayesian Network Classifiers Scalable Bayesian Network Classifiers Out-of-core SGD Geoff Webb Ana Martinez Nayyar Zaidi • Classification Time Comparisons 100 1 VWSF VWLF SKDB Datasets kddccup MITFaceSetC MITFaceSetB UPSSextend ded MITFaceSeetA uscen nsus covty ype 0.1 donattion Conclusions & Future Research Census-inco ome Experiments 10 Poker-haand Selective KDB localizattion Bayesian Network Classifiers Time (seconds) Introduction Scalable Bayesian Network Classifiers In-core BayesNet and RF Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research BayesNet RF x = sampled dataset k-selective KDB BayesNet 6-3-3 RF (Num) 5-0-6 Scalable Bayesian Network Classifiers In-core BayesNet and RF Geoff Webb Ana Martinez Nayyar Zaidi • Training Time Comparisons 100000 Introduction RF SKDB 10 1 Datasets USPSExtennded SetB MITFaceS MITFaceSetA covttype 0.1 donattion Conclusions & Future Research BayesNet 100 Census-incoome Experiments 1000 localizattion Selective KDB 10000 Time (sseconds) Bayesian Network Classifiers Scalable Bayesian Network Classifiers In-core BayesNet and RF Geoff Webb Ana Martinez Nayyar Zaidi • Classification Time Comparisons 100 RF SKDB 1 Datasets USPSExten nded MITFaceS SetB MITFaceSetA 0.1 covttype Conclusions & Future Research BayesNet donattion Experiments 10 Census-inco ome Selective KDB localizattion Bayesian Network Classifiers Time (sseconds) Introduction Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Global comparisons • Cumulative Ranking Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research • Selective KDB copes well with high-dimensional datasets and datasets with more than 1 million points. • RF performs better on datasets with small # of attributes. • VW has advantage for sparse numeric data. Scalable Bayesian Network Classifiers Global comparisons Geoff Webb Ana Martinez Nayyar Zaidi • Ranking 8 7.63 Introduction 7 Bayesian Network Classifiers Experiments Conclusions & Future Research 5.88 6 Rankking Selective KDB 6.81 5 4 3.47 3.44 3.34 2.91 3 2.53 2 1 NB AODE TAN BayesNet y Learners VWLF KDB ((best k)) RF ((Num)) SKDB Scalable Bayesian Network Classifiers Global comparisons (Time) 1000 Geoff Webb Ana Martinez Nayyar Zaidi 500 200 Time (seconds) − log. scale 100 10000 50 Conclusions & Future Research Time (seconds) Experiments 5000 Selective KDB 0 Bayesian Network Classifiers 15000 Introduction NB TAN AODE KDB ksKDB Learner Training VWsq VWlog NB TAN AODE KDB Learner Test ksKDB VWsq VWlog Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Stepping back • The key trick is nested evaluation of a large class of count-based models Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research • Works also with 2-pass selective evaluation of AnDE models • Combines the efficiency of simple generative techniques with the power of discriminative learning • Can use any loss function that is a function of each P̂(y | x) Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Future Research • Numeric attributes • it seems that there should be something better than discretisation. • this remains elusive ... • Overfitting avoidance • Increase number of alternative models considered • Explore other forms of nested BNCs • Two pass Selective KDB • sample small test set in second pass • Single pass generative/discriminative learning • initially learn a simple generative model • collect discriminative statistics and refine the model when there is sufficient evidence to make a choice • refinements might be attribute selection, structural refinement or weighting • repeat Scalable Bayesian Network Classifiers Satellite learning curves 1 Geoff Webb Ana Martinez Nayyar Zaidi 0.9 Introduction 0.8 Bayesian Network Classifiers 0.7 0.6 Conclusions & Future Research RMSE Selective KDB Experiments NB KDB K=1 KDB K=2 KDB K=3 KDB K=4 KDB K=5 KDB K=6 KDB K=7 KDB K=8 0.5 0.4 0.3 0.2 0.1 0 Training set size Scalable Bayesian Network Classifiers Satellite learning curves Geoff Webb Ana Martinez Nayyar Zaidi 0.9 Introduction 0.8 Bayesian Network Classifiers 0.7 0.6 Conclusions & Future Research RMSE Selective KDB Experiments NB KDB K=1 KDB K=2 KDB K=3 KDB K=4 KDB K=5 KDB K=6 KDB K=7 KDB K=8 SKDB K=10 1 0.5 0.4 0.3 0.2 0.1 0 Training set size Scalable Bayesian Network Classifiers Satellite learning curves Geoff Webb Ana Martinez Nayyar Zaidi 0.9 Introduction 0.8 Bayesian Network Classifiers 0.7 0.6 Conclusions & Future Research RMSE Selective KDB Experiments NB KDB K=1 KDB K=2 KDB K=3 KDB K=4 KDB K=5 KDB K=6 KDB K=7 KDB K=8 SKDB K=10 Random Forest 1 0.5 0.4 0.3 0.2 0.1 0 Training set size Scalable Bayesian Network Classifiers Geoff Webb Ana Martinez Nayyar Zaidi Introduction Bayesian Network Classifiers Selective KDB Experiments Conclusions & Future Research Conclusions • Large data calls for fundamentally new learning algorithms • We are pioneering a new generation of theoretically well-founded algorithms that are both scalable to very large data and capable of exploiting the fine detail inherent therein