Documents
Research Manuscript
An Empirical Bayes Approach to Partially Labeled and Shuffled Data Sets
- Citation Author(s):
- Submitted by:
- Alex Dytso
- Last updated:
- 13 February 2020 - 3:22pm
- Document Type:
- Research Manuscript
- Document Year:
- 2020
- Event:
- Presenters:
- Alex Dytso
- Paper Code:
- 2692
- Categories:
- Keywords:
- Log in to post comments
This work outlines a method for an application of empirical Bayes in the setting of semi-supervised learning. That is, we consider a scenario in which the training set is partially or entirely unlabeled. In addition to the missing labels, we also consider a scenario where the available training data might be shuffled (i.e., the features and labels are not matched).
Specifically, we propose to train model-based empirical Bayes separately on the set of features and the set of labels and combine/mix the two models based on the proportion of unlabeled pairs. The method then can be used to recover the missing labels (i.e., create pseudo-labels) of the data set and, in addition, if the data is shuffled, recover the correct permutation of the data. The technique is evaluated for a multivariate Gaussian model and is shown to consistently outperform a maximum likelihood approach. Moreover, the procedure is shown to be a consistent estimator for a multivariate Gaussian model with an arbitrary (non-degenerate) covariance matrix.
Comments
n/A
n/A