Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION

Abstract: 

In this paper, we present an algorithm which introduces phase-perturbation to the training database when training phase-sensitive deep neural-network models. Traditional features such as log-mel or cepstral features do not have have any phase-relevant information.However features such as raw-waveform or complex spectra features contain phase-relevant information. Phase-sensitive features have the advantage of being able to detect differences in time of
arrival across different microphone channels or frequency bands. However, compared to magnitude-based features, phase information is more sensitive to various kinds of distortions such as variations in microphone characteristics, reverberation, and so on. For traditional magnitude-based features, it is widely known that adding noise or reverberation, often called Multistyle-TRaining (MTR), improves robustness. In a similar spirit, we propose an algorithm which introduces spectral distortion to make the deep-learning models more robust to phase-distortion. We call this approach Spectral-Distortion TRaining (SDTR). In our experiments using a training set consisting of 22-million utterances with and without MTR, this approach reduces Word Error Rates (WERs) relatively by 3.2 % and 8.48 % respectively on test sets recorded on Google Home.

up
0 users have voted:

Paper Details

Authors:
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani
Submitted On:
7 May 2018 - 12:19am
Short Link:
Type:
Poster
Event:
Presenter's Name:
Chanwoo Kim
Paper Code:
ICASSP18001 4
Document Year:
2018
Cite

Document Files

icassp_4404_poster.pdf

(84 downloads)

Subscribe

[1] Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani, "SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3202. Accessed: Sep. 18, 2018.
@article{3202-18,
url = {http://sigport.org/3202},
author = {Chanwoo Kim; Tara Sainath; Arun Narayanan; Ananya Misra; Rajeev Nongpiur; Michiel Bacchiani },
publisher = {IEEE SigPort},
title = {SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION
AU - Chanwoo Kim; Tara Sainath; Arun Narayanan; Ananya Misra; Rajeev Nongpiur; Michiel Bacchiani
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3202
ER -
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani. (2018). SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/3202
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani, 2018. SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION. Available at: http://sigport.org/3202.
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani. (2018). "SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION." Web.
1. Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani. SPECTRAL DISTORTION MODEL FOR TRAINING PHASE-SENSITIVE DEEP-NEURAL NETWORKS FOR FAR-FIELD SPEECH RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3202