Documents
Poster
TRAINING SAMPLE SELECTION FOR DEEP LEARNING OF DISTRIBUTED DATA
- Citation Author(s):
- Submitted by:
- Xiaoqing Zhu
- Last updated:
- 15 September 2017 - 3:49pm
- Document Type:
- Poster
- Document Year:
- 2017
- Event:
- Presenters:
- Xiaoqing Zhu
- Paper Code:
- MA-PC.6
- Categories:
- Log in to post comments
The success of deep learning—in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15 bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.