Scalable Mutual Information Estimation using Dependence Graphs

The Mutual Information (MI) is an often used measure of dependency between two random variables utilized in informa- tion theory, statistics and machine learning. Recently several MI estimators have been proposed that can achieve paramet- ric MSE convergence rate. However, most of the previously proposed estimators have high computational complexity of at least O(N2). We propose a unified method for empirical non-parametric estimation of general MI function between random vectors in Rd based on N i.i.d. samples. The re- duced complexity MI estimator, called the ensemble depen- dency graph estimator (EDGE), combines randomized locality sensitive hashing (LSH), dependency graphs, and ensemble bias-reduction methods. We prove that EDGE achieves op- timal computational complexity O(N), and can achieve the optimal parametric MSE rate of O(1/N) if the density is d times differentiable. To the best of our knowledge EDGE is the first non-parametric MI estimator that can achieve paramet- ric MSE rates with linear time complexity. We illustrate the utility of EDGE for the analysis of the information plane (IP) in deep learning. Using EDGE we shed light on a controversy on whether or not the compression property of information bottleneck (IB) in fact holds for ReLu and other rectification functions in deep neural networks (DNN).

ICASSP_2019_noshad.pdf

Estimation, Mutual Information, Information Bottleneck, Deep Learning (320)

Thumbs Up

CITE

Documents

Presentation Slides

Scalable Mutual Information Estimation using Dependence Graphs

ICASSP_2019_noshad.pdf

QUESTIONS?