Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech

Abstract: 

This paper investigates the use of subband temporal envelope (STE) features and speed perturbation based data augmentation in end-to-end recognition of distant conversational speech in everyday home environments. STE features track energy peaks in perceptual frequency bands which reflect the resonant properties of the vocal tract. Data augmentation is performed by adding more training data obtained after modifying the speed of the original training data. Experiments show that using STE features and speed perturbation based data augmentation helps improving the performance of end-to-end speech recognition on a challenging corpus which was used for the CHiME 2018 speech separation and recognition challenge. STE features provide up to 2.0% relative word error rate (WER) reduction compared to the conventional log-Mel filter-bank (FBANK) features. Data augmentation is used with both features and provides up to 5.2% relative WER reduction. We propose a simple hypothesis selection method to combine the hypotheses produced by the end-to-end systems using FBANK and STE features. This method additionally provides up to 4.7% relative WER reduction.

https://doi.org/10.1109/ICASSP.2019.8682247

up
0 users have voted:

Paper Details

Authors:
Submitted On:
9 May 2019 - 9:29am
Short Link:
Type:
Poster
Event:
Presenter's Name:
Cong-Thanh Do
Paper Code:
3463
Document Year:
2019
Cite

Document Files

Poster_icassp2019_CTDO.pdf

(14)

Subscribe

[1] , "Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4187. Accessed: Jun. 25, 2019.
@article{4187-19,
url = {http://sigport.org/4187},
author = { },
publisher = {IEEE SigPort},
title = {Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech},
year = {2019} }
TY - EJOUR
T1 - Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4187
ER -
. (2019). Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech. IEEE SigPort. http://sigport.org/4187
, 2019. Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech. Available at: http://sigport.org/4187.
. (2019). "Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech." Web.
1. . Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4187