TRAINING SAMPLE SELECTION FOR DEEP LEARNING OF DISTRIBUTED DATA

The success of deep learning—in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15 bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.

2017-09-17-icip2017-poster-final-v2.pdf

Poster presentation for Paper #2847 (671)

Thumbs Up

CITE

QUESTIONS?

Send Author a Private Message
Report a problem with this Document

Documents

Poster

TRAINING SAMPLE SELECTION FOR DEEP LEARNING OF DISTRIBUTED DATA

QUESTIONS?