Sorry, you need to enable JavaScript to visit this website.

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech

Citation Author(s):
Submitted by:
Cong-Thanh Do
Last updated:
9 May 2019 - 9:29am
Document Type:
Poster
Document Year:
2019
Event:
Presenters:
Cong-Thanh Do
Paper Code:
3463
 

This paper investigates the use of subband temporal envelope (STE) features and speed perturbation based data augmentation in end-to-end recognition of distant conversational speech in everyday home environments. STE features track energy peaks in perceptual frequency bands which reflect the resonant properties of the vocal tract. Data augmentation is performed by adding more training data obtained after modifying the speed of the original training data. Experiments show that using STE features and speed perturbation based data augmentation helps improving the performance of end-to-end speech recognition on a challenging corpus which was used for the CHiME 2018 speech separation and recognition challenge. STE features provide up to 2.0% relative word error rate (WER) reduction compared to the conventional log-Mel filter-bank (FBANK) features. Data augmentation is used with both features and provides up to 5.2% relative WER reduction. We propose a simple hypothesis selection method to combine the hypotheses produced by the end-to-end systems using FBANK and STE features. This method additionally provides up to 4.7% relative WER reduction.

https://doi.org/10.1109/ICASSP.2019.8682247

up
0 users have voted: