19 February - 21 February 2018 The Institute Of .

6d ago
1.88 MB
13 Pages

FIMI201819 February - 21 February 2018The Institute of Statistical Mathematics

Floor Map3rd FloorS5: Seminar Room 5 (Session)S4: Seminar Room 4 (Coffee Room)2nd FloorHall: Auditorium (Tuesday only)R: Resting room D220 (Tuesday only)2

Lunch Map(1)(2)City Hall ()()(3)Tokyo District Court()Lalaport ()Takamatsu Sta.()Local AutonomyCollege ()(5)(7)(4)IKEA(6)National DisasterMedical CenterTachikawaPolice StationPack lunches (at the lobby lounge)Rin rin:450 500yen eachHaiji (curry)500yen eachZuikyo (Chinese)500yen eachHello Lunch300 400yen each11:00 13:00(a light dessert offered on Wednesdays)(miso-soup included)Restaurant(1) City Hall dining room, 3rd floor. (立川市役所食堂(3F))300 500yen(2) Café Harmony at the City Hall, 1st floor. (カフェハーモニー, 立川市役所 1F)600yen(3) Pole Light, Tokyo District Court dining room, 1st basement floor. (食堂ポールライト, 東京地裁 B1F)(4) Chinese Restaurant Zuikyo, near MINISTOP. (中華料理店 瑞京),(5) Local Autonomy College dining room. (自治大学校食堂)(6) IKEA restaurant and café, 2nd floor. (IKEA レストラン 2F)(7) Restaurants in Lalaport (shopping mall). (ららぽーと レストラン街)Convenience StoresPopula (City Hall 1F) / MINISTOP / 7-Eleven / Family Mart3

Program19 February (Mon)10:00-10:1010:10-11:00OpeningArthur Gretton (University College London)Conditional Densities and Efficient Models in Infinite Exponential Families11:15-12:05Bharath K Sriperumbudur (Pennsylvania State University)On Approximate Kernel PCA Using Random Features12:05-13:4513:45-14:35Lunch BreakMotonobu Kanagawa (Max Planck Institute for Intelligent Systems)Convergence Analysis of Deterministic Kernel-Based Quadrature Rules inMisspecified Settings14:50-15:40Krikamol Muandet (Mahidol University)Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces20 February (Tue)09:35-10:25Song Liu (Bristol University)Density Ratio Estimation using Stein Method and Its Applications10:40-11:30Kenji Fukumizu (The Institute of Statistical Mathematics)Machine Learning Approach to Topological Data Analysis11:30–13:1013:10–14:00Lunch BreakAlexandre Tsybakov (CREST, ENSAE)Optimal Variable Selection and Noisy Adaptive Compressed Sensing14:15-15:05Mladen Kolar (University of Chicago)Estimation and Inference for Differential Networks15:20-16:10Wittawat Jitkrittum (Max Planck Institute for Intelligent Systems)A Linear-Time Kernel Goodness-of-Fit Test18:00-Workshop Banquet4

Program21 February (Wed)09:35-10:25Taiji Suzuki (The University of Tokyo)Connecting Model Compression and Generalization Analysis for Deep NeuralNetwork10:40-11:30Yarin Gal (University of Oxford)Bayesian Deep Learning11:30–13:1013:10–14:00Lunch BreakJohannes Schmidt-Hieber (Leiden University)Statistical Theory for Deep Neural Networks with ReLU Activation Function14:05-14:55Masaaki Imaizumi (The Institute of Statistical Mathematics)Statistical Estimation for Non-Smooth Functions by Deep Neural Networks14:55-15:00Closing5

Abstract (Monday)Arthur Gretton (University College London)Conditional Densities and Efficient Models in Infinite Exponential FamiliesThe exponential family is one of the most powerful and widely used classes of models in statistics. A methodwas recently developed to fit this model when the natural parameter and sufficient statistic are infinitedimensional, using a score matching approach. The infinite exponential family is a natural generalisation of thefinite case, much like the Gaussian and Dirichlet processes generalise their respective finite models.In this talk, I'll describe two recent results which make this model more applicable in practice, by reducingthe computational burden and improving performance for high-dimensional data. The first is a Nytsrom-likeapproximation to the full solution. We prove that this approximate solution has the same consistency andconvergence rates as the full-rank solution (exactly in Fisher distance, and nearly in other distances), withguarantees on the degree of cost and storage reduction. The second result is a generalisation of the modelfamily to the conditional case, again with consistency guarantees. In experiments, the conditional modelgenerally outperforms a competing approach with consistency guarantees, and is competitive with a deepconditional density model on datasets that exhibit abrupt transitions and heteroscedasticity.Bharath K Sriperumbudur (Pennsylvania State University)On Approximate Kernel PCA Using Random FeaturesKernel methods are powerful learning methodologies that provide a simple way to construct nonlinearalgorithms from linear ones. Despite their popularity, they suffer from poor scalability in big data scenarios.Various approximation methods, including random feature approximation, have been proposed to alleviatethe problem. However, the statistical consistency of most of these approximate kernel methods is not wellunderstood except for kernel ridge regression wherein it has been shown that the random featureapproximation is not only computationally efficient but also statistically consistent with a minimax optimalrate of convergence. In this work, we investigate the efficacy of random feature approximation in the contextof kernel principal component analysis (KPCA) by studying the statistical behavior of approximate KPCA interms of the convergence of eigenspaces and the reconstruction error.6

Abstract (Monday)Motonobu Kanagawa (Max Planck Institute for Intelligent Systems)Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in MisspecifiedSettingsIn this talk, we present convergence analysis of kernel-based quadrature rules in misspecified settings, focusingon deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a testintegrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provideconvergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights,and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sumof absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance betweendistance design points does not decrease very quickly. As a consequence of the latter result, we derive a rateof convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points tomake Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptivelyachieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknownsmoothness of a test integrand), under a slightly stronger regularity condition on the integrand.(Joint work with Bharath K. Sriperumbudur and Kenji Fukumizu)Krikamol Muandet (Mahidol University)Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert SpacesTransfer operators such as the Perron-Frobenius or Koopman operator play an important role in the globalanalysis of complex dynamical systems. The eigenfunctions of these operators can be used to detectmetastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposedsignals. We extend transfer operator theory to reproducing kernel Hilbert spaces and show that theseoperators are related to Hilbert space representations of conditional distributions, known as conditionalmean embeddings in the machine learning community. Moreover, numerical methods to compute empiricalestimates of these embeddings are akin to data-driven methods for the approximation of transfer operatorssuch as extended dynamic mode decomposition and its variants. In fact, most of the existing methods can bederived from our framework, providing a unifying view on the approximation of transfer operators. One mainbenefit of the presented kernel-based approaches is that these methods can be applied to any domain wherea similarity measure given by a kernel is available. We illustrate the results with the aid of guiding examplesand highlight potential applications in molecular dynamics as well as video and text data analysis.7

Abstract (Tuesday)Song Liu (Bristol University)Density Ratio Estimation using Stein Method and Its ApplicaitonsIn this research, we estimate the ratio between the data generating probability density function and a modeldensity function with the help of Stein operator. The estimated density ratio allows us to approximate theKullback-Leibler divergence from a model to the data efficiently. We explore applications using such agoodness of fit measure including parameter fitting, Bayesian learning and change point detection.This is a joint work with Wittawat Jitkrittum and Carl Henrik Ek.Lizhen Lin (The University of Notre Dame)Geometry and Statistics: Nonparametric Statistical Inference of Non-Euclidean DataThis talk presents some recent advances in nonparametric inference on manifolds and other non-Euclideanspaces. The focus is on nonparametric inference base on Frechet means.In particular, we present omnibus central limit theorems for Frechet means for inference, which can be appliedto general metric spaces including stratified spaces, greatly expanding the current scope of inference.Applications are also provided to the space of symmetric positive definite matrices arising in diffusion tensorimaging. A robust framework based on the classical idea of median-of-means is also proposed which yieldsestimates with provable robustness and improved concentration. In addition to inferring i.i.d data, we alsoconsider nonparametric regression problems where predictors or responses lying on manifolds. Varioussimulated or real data examples are considered.8

Abstract (Tuesday)Alexandre Tsybakov (CREST, ENSAE)Optimal Variable Selection and Noisy Adaptive Compressed SensingWe consider variable selection based on 𝑛 observations from a high-dimensional linear regression model.The unknown parameter of the model is assumed to belong to the class 𝑆 of all 𝑠-sparse vectors in ℝ%whose non-zero components are greater than 𝑎 0. Variable selection in this context is an extensivelystudied problem and various methods of recovering sparsity pattern have been suggested. However, in thetheory not much is known beyond the consistency of selection. For Gaussian design, which is of majorimportance in the context of compressed sensing, necessary and sufficient conditions of consistency for someconfigurations of 𝑛, 𝑝, 𝑠, 𝑎 are available. They are known to be achieved by the exhaustive search decoder,which is not realizable in polynomial time and requires the knowledge of 𝑠. This talk will focus on the issueof optimality in variable selection based on the Hamming risk criterion. The benchmark behavior ischaracterized by the minimax risk on the class 𝑆. We propose an adaptive algorithm independent of s,a ,and of the noise level that nearly attains the value of the minimax risk. This algorithm is the first method,which is both realizable in polynomial time and is consistent under the same (minimal) sufficient conditionsas the exhaustive search decoder.Mladen Kolar (University of Chicago)Estimation and Inference for Differential NetworksWe present a recent line of work on estimating differential networks and conducting statistical inferenceabout parameters in a high-dimensional setting. First, we consider a Gaussian setting and show how to directlylearn the difference between the graph structures. A debiasing procedure will be presented for constructionof an asymptotically normal estimator of the difference. Next, building on the first part, we show how tolearn the difference between two graphical models with latent variables. Linear convergence rate isestablished for an alternating gradient descent procedure with correct initialization. Simulation studiesillustrate performance of the procedure. We also illustrate the procedure on an application in neuroscience.Finally, we will discuss how to do statistical inference on the differential networks when data are not Gaussian.9

Abstract (Tuesday)Wittawat Jitkrittum (Max Planck Institute for Intelligent Systems)A Linear-Time Kernel Goodness-of-Fit TestWe propose a novel adaptive test of goodness of fit, with computational cost linear in the number of samples.We learn the test features that best indicate the differences between observed samples and a referencemodel, by minimizing the false negative rate. These features are interpretable, indicating where the modeldoes not fit the samples well. The features are constructed via Stein's method, meaning that it is not necessaryto compute the normalising constant of the model. We analyse the asymptotic Bahadur efficiency of the newtest, and prove that under a mean-shift alternative, our test always has greater relative efficiency than aprevious linear-time kernel test, regardless of the choice of parameters for that test. In experiments, theperformance of our method exceeds that of the earlier linear-time test, and matches or exceeds the powerof a quadratic-time kernel test. In high dimensions and where model structure may be exploited, our goodnessof fit test performs far better than a quadratic-time two-sample test based on the Maximum MeanDiscrepancy, with samples drawn from the model.Kenji Fukumizu (The Institute of Statistical Mathematics)Machine Learning Approach to Topological Data AnalysisTopological data analysis (TDA) is a recent methodology for extracting topological and geometrical featuresfrom complex geometric data structures. Persistent homology, a new mathematical notion proposed byEdelsbrunner (2002), provides a multiscale descriptor for the topology of data, and has been recently appliedto a variety of data analysis. In this talk I will introduce a machine learning framework of TDA by combiningpersistence homology and kernel methods. As an expression of persistent homology, persistence diagramsare widely used to represent the lifetimes of generators of homology groups. While they serve as a compactrepresentation of data, it is not strai