6d ago

0 Views

0 Downloads

3.73 MB

31 Pages

Transcription

Learning Machines Seminars2020-11-05Uncertainty in deep learningOlof Mogren, PhDRISE Research Institutes of Sweden

Our world is full of uncertainties: measurement errors, modeling errors, or uncertainty due to test-databeing out-of-distribution are some examples. Machine learning systems are increasingly being used incrucial applications such as medical decision making and autonomous vehicle control: in theseapplications, mistakes due to uncertainties can be life threatening.Deep learning have demonstrated astonishing results for many different tasks. But in general, predictionsare deterministic and give only a point estimate as output. A trained model may seem confident inpredictions where the uncertainty is high. To cope with uncertainties, and make decisions that arereasonable and safe under realistic circumstances, AI systems need to be developed with uncertaintystrategies in mind. Machine learning approaches with uncertainty estimates can enable active learning: anacquisition function can be based on model uncertainty to guide in data collection and tagging. It can alsobe used to improve sample efficiency for reinforcement learning approaches.In this talk, we will connect deep learning with Bayesian machine learning, and go through some exampleapproaches to coping with, and leveraging, the uncertainty in data and in modelling, to produce better AIsystems in real world scenarios.

Automated drivingKendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems (pp. 5574-5584).

Automated drivingKendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems (pp. 5574-5584).

Automated drivingKendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems (pp. 5574-5584).

Deep learning Nested transformationsh(x) a(xW b)End to end training: backpropagation, optimizationa: activation functions Logistic, tanh, reluClassification: Softmax outputSoftmax outputs: cross-entropy loss Probabilistic interpretation

Training data:Out of distribution data Train: cats vs dogsAt test time appears

Training data:Out of distribution data Train: cats vs dogsAt test time appears a bird imageWhat to do?Testing data:

Training data:Out of distribution data Train: cats vs dogsAt test time appears a bird imageWhat to do?What will the softmax do?Testing data:

Out of domain data (ctd)Mauna Loa CO 2 concentrations datasetImage By Yarin Gal.

Uncertainty Aleatoric Noise inherent in data observationsUncertainty in data or sensor errorsWill not decrease with larger dataIrreducible error/Bayes errorEpistemic Caused by the model Parameters StructureLack of knowledge of generating distributionReduced with increasing dataImage by Michael Kana.

InputPredictionGround truthAleatoric uncertaintyEpistemic uncertaintyKendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision?. In Advances in neural information processing systems (pp. 5574-5584).

Softmax outputs A cat-dog classifier knows nothingabout warblersOutputs from trained softmax layerdo not show model confidenceImage By Yarin Gal.

Calibrating the softmax Expected Calibration Error:"confidence" matches accuracy Model calibration declines, due to E.g. of 100 datapoints where confidence is 0.8, 80 ofthem should be correct.Increased model capacityBatch norm (allows for larger models)Decreased weight decayOverfitting to NLL loss (but not accuracy)Solutions Histogram binningIsotonic regression: piecewise constant functionBayesian binning into quantiles: distribution overbinning schemesGuo, C., et al. On calibration of modern neural networks. arXiv:1706.04599. ICML 2017.

Deep ensemblesMSE (5 ensemble)NLL (single)NLL (single) adversarialNLL (5 ensemble) adversarialBalaji, L., Pritzel, A., Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. NIPS. 2017.

Monte-Carlo Dropout Independently, with prob p, set each input to zeroExponential ensembleMonte-Carlo dropout: Run network several times with different random seed.Equivalent to prior (L2 weight decay equivalent to Gaussian prior).Gal, Y., Ghahramani, Z., Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2018.

MC-Dropout forActive learning High uncertainty high informationData efficiencyDeep RL Thompson samplingData efficiencyGal, Y., Ghahramani, Z., Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2018.

Density mixtures networks Distributional parameter estimationRegression model with Gaussian output Train using NLL lossEnough mixture components arbitrary distribution approximationBishop, C.M., Mixture density networks, 1994.

Recurrent density networks: blood glucose predictionsblood glucose test data (Ohio T1DM dataset)synthetic square wave datastochastic amplitudestochastic period lengthMartinsson, J., Schliep, A., Eliasson, B., Mogren, O., Blood glucose prediction with variance estimation using recurrent neural networks.Journal of Healthcare Informatics Research. 2020.

Bayesian machine learning Encoding and incorporating prior belief Distribution over model parametersPosterior over model parametersInference: marginalizing over latent parametersComputationally demanding Evidence term requires expensive integralSimple models: Conjugate priorsApproximate Bayesian methods: Variational inference Markov chain Monte CarloLikelihoodPosteriorPriorOr marginal likelihoodp(new data data model) · p(model)p(model new data) p(new data)Evidence

Bayesian modellingexpectation under the posterior distribution onweights is equivalent to using an ensemble of anuncountably infinite number of models

Variational inference True posterior p(w X,Y) is intractable in generalDefine an approximatingvariational distribution qθ.Minimize KL btw q and p wrt θ.Predictive distributionEquivalent to maximizing theevindence lower bound:

Bayesian neural networks A prior on each weight Variational approximations Random variableDistribution over possible valuesNumerical integration over variationalposteriorBayes by Backprop: Minimize variational free energy(ELBO on marginal likelihood)Improve generalizationBayes by Backpropstandardneural networkRegression of noisy data with interquatile ranges. Blackcrosses are training samples. Red lines are median predictions.Blue/purple region is interquartile range.MacKay, D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 1992,Graves, A., Practical Variational Inference for Neural Networks, NIPS 2011Blundell, et.al., Weight uncertainty in neural networks, ICML 2015

Note on Bayesian methodsAdvantages: CoherentConceptually straightforwardModularUseful predictionsLimitations: Subjective. Assumptions.Computationally demandingUse of approximations weakens thecoherence argumentZoubin Ghahramani

Monte-Carlo Dropout Approximate posterior.MC Dropout is equivalent to an approximation of a deep Gaussian process.Gal, Y., Ghahramani, Z., Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2018.

Stationary Activations for UncertaintyCalibration in Deep Learning Matérn activation functionMC-DropoutWhite: ConfidentGrey: UncertainBlack: Decision boundaryPoints: Training dataMeronen, L., Irwanto, C., & Solin, A. Stationary Activations for Uncertainty Calibration in Deep Learning. arXiv preprint arXiv:2010.09494. NeurIPS 2020.

Causal-Effect Inference Failure Detection Counterfactual deep learning modelsEpistemic uncertainty - covariate shiftMC DropoutJesson, A., Mindermann, S., Shalit, U., Gal, Y., Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models, NeurIPS 2020

NeurIPS 2020Antorán et.al., Depth Uncertainty in Neural NetworksWenzel, et.al., Hyperparameter Ensembles for Robustness and Uncertainty QuantificationValdenegro-Toro, et.al., Deep Sub-Ensembles for Fast Uncertainty Estimation in Image ClassificationLindinger, et.al., Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive UncertaintiesLiu, et.al., Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

Getting started Bayesian Layers: A module for neural network uncertainty (Tran, et.al., 2019) Implements variational approximationEdwardlib: A library for probabilistic modeling, inference, and criticism. (edwardlib.org)

ReferencesMacKay, D.J.C., A Practical Bayesian Framework for Backpropagation Networks, Neural Computation, 1992Bishop, C.M., Mixture density networks, 1994Graves, A., Practical Variational Inference for Neural Networks, NIPS 2011Blundell, et.al., Weight uncertainty in neural networks, ICML 2015Gal, Y., Ghahramani, Z., Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2015Gal, Y., Uncertainty in Deep Learning, PhD thesis, 2016Kendall, A., Gal, Y., What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, arXiv:1703.04977, NIPS 2017.Balaji, L., Pritzel, A., Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. NIPS. 2017.Guo, C., et al. On calibration of modern neural networks. arXiv:1706.04599. ICML 2017D. Tran, M. W. Dusenberry, D. Hafner, and M. van der Wilk. Bayesian Layers: A module for neural network uncertainty. NeurIPS 2019.Martinsson, J., Schliep, A., Eliasson, B., Mogren, O., Blood glucose prediction with variance estimation using recurrent neural networks. Journal of Healthcare InformaticsResearch, JHIR, 2020.Wilson, A.G. The case for Bayesian deep learning. arXiv:2001.10995, 2020.Meronen, L., Irwanto, C., & Solin, A. Stationary Activations for Uncertainty Calibration in Deep Learning. arXiv preprint arXiv:2010.09494. NeurIPS 2020.Jesson, A., Mindermann, S., Shalit, U., Gal, Y., Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models, NeurIPS 2020Geoffrey E. Hinton and Drew van Camp. Keeping the neural networks simple by minimizing the description length of the weightsJohn S. Denker and Yann leCun., Transforming Neural-Net Output Levels to Probability DistributionsRadford M. Neal, Bayesian Learning for Neural NetworksDavid J.C. MacKay., A Practical Bayesian Framework for Backprop lutely-ok-50ffa562cb0b