When Langevin meets Markov
In machine learning literature, a large number of problems amount to simulate a density which is log-concave (at least in the tails) and perhaps non smooth. Most of the researchs so far has been devoted to solve the MAP problem, which amounts to solve a high-dimensional convex (perhaps non smooth) program. The purpose of this series of lectures is to understand how we can draw ideas from this community to design efficient sampling algorithms, with convergence guarantees (and possibly « usable » convergence bounds »). In high dimension, only first order method (exploiting exclusively gradient information) is a must. Most of the efficient algorithms know so far may be seen as variants of the gradient descent algorithms, most often coupled with « partial updates » (coordinates descent algorithms). This of course suggests to study methods derived from Euler discretization of the Langevin diffusion, which may be seen as a noisy version of the gradient descent. Partial updates may in this context as « Gibbs steps » where some components are frozen. This algorithm may be generalized in the non-smooth case by « regularizing » the objective function. The Moreau-Yosida inf-convolution algorithm is an appropriate candidate in such case, because it does not modify the minimum value of the criterion while transforming a non smooth optimization problem in a smooth one.
Scalable Methods for Bayesian Statistics and Machine Learning
Methods to allow applications of statistical and machine learning models to ever larger datasets are becoming increasingly important in the age of Big Data. I will provide an overview of recent algorithmic techniques, both in Monte Carlo and variational flavours, and making use of modern parallel/distributed computing architectures, for scalable posterior computation in Bayesian models. Specific topics will include:
Modeling, computation, inference and applications of graphical models
Graphical models represent a canonical statistical model for capturing conditional dependencies between a set of variables of interest. Recent applications to high-dimensional data include reconstruction of gene regulatory networks, analysis of financial data and legislative networks. In this talk, we review algorithms for undirected graphical models (e.g. neighborhood selection and variants of the graphical lasso), and present the theoretical properties of the estimates under assumptions of sparsity. We further discuss extensions that account for effects of latent variables, as well as approaches that jointly estimate structurally related graphical models. Time permitting, we will also provide a brief overview of algorithmic and theoretical developments for the estimation of directed graphical models from high-dimensional data.