wav2letter++ : A Fast Open-Source Speech Recognition Framework

Citation Author(s):: Vineel Pratap

Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert
Submitted by:: Vineel Pratap
Last updated:: 13 May 2019 - 8:40am
Document Type:: Poster
Document Year:: 2019
Event:: ICASSP 2019
Presenters:: Vineel Pratap
Paper Code:: 4733

Categories:: Audio and Acoustic Signal Processing
Keywords:: ASR Training Strategies and Toolkits

This paper introduces wav2letter++, a fast open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.

wav2letter++-poster.pdf

wav2letter++-poster.pdf (1821)

Thumbs Up

CITE

Documents

Poster

wav2letter++ : A Fast Open-Source Speech Recognition Framework

wav2letter++-poster.pdf

QUESTIONS?