Sorry, you need to enable JavaScript to visit this website.

ICASSP2017 Poster (Paper #4319)

Citation Author(s):
Ho-Yong Lee, Ji-Won Cho, Minook Kim, and Hyung-Min Park
Submitted by:
Hyung-Min Park
Last updated:
3 March 2017 - 8:36pm
Document Type:
Poster
Document Year:
2017
Event:
Presenters:
Hyung-Min Park
Paper Code:
SP-P4.10
 

The performance of automatic speech recognition (ASR) system is often degraded in adverse real-world environments. In recent times, deep learning has successfully emerged as a breakthrough for acoustic modeling in ASR; accordingly, deep-neural-network(DNN)-based speech feature enhancement (FE) approaches have attracted much attention owing to their powerful modeling capabilities. However, DNN-based approaches are unable to achieve remarkable performance improvements for speech with severe distortion in the test environments different from training environments. In this letter, we propose a DNN-based FE method where the DNN inputs include pre-enhanced spectral features computed from multi-channel input signals to reconstruct noise-robust features. The pre-enhanced spectral features are obtained by direction-of-arrival(DOA)-constrained independent component analysis (DCICA) followed by Bayesian FE using a hidden-Markov-model(HMM) prior, to exploit the capabilities of efficient online target speech extraction and efficient FE with prior information for robust ASR. In addition, noise spectral features computed from DCICA are included for further improvement. Therefore, the DNN is trained to reconstruct a clean spectral feature vector, from a sequence of corrupted input feature vectors in addition to the corresponding pre-enhanced and noise feature vectors. Experimental results demonstrate that the proposed method significantly improves recognition performance, even in mismatched noise conditions.

up
0 users have voted: