Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION

Abstract: 

The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between speech enhancement and ASR optimization objectives. In this work, we propose to unify an acoustic model framework by optimizing spatial filtering and long short-term memory (LSTM) layers from multi-channel (MC) input. Our acoustic model subsumes beamformers with multiple types of array geometry. In contrast to deep clustering methods that treat a neural network as a black box tool, the network encoding the spatial filters can process streaming audio data in real time without the accumulation of target signal statistics. We demonstrate the effectiveness of such MC neural networks through ASR experiments on the real-world far-field data. We show that our two-channel acoustic model can on average reduce word error rates (WERs) by 13.4 and 12.7% compared to a single channel ASR system with the log-mel filter bank energy (LFBE) feature under the matched and mismatched microphone placement conditions, respectively. Our result also shows that our two-channel network achieves a relative WER reduction of over 7.0% compared to conventional beamforming with seven microphones overall.

up
0 users have voted:

Paper Details

Authors:
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
Submitted On:
10 May 2019 - 6:38pm
Short Link:
Type:
Poster
Event:
Presenter's Name:
Kenichi Kumatani
Paper Code:
SLP-P15.3
Document Year:
2019
Cite

Document Files

poster file

(14)

manuscript file

(10)

Subscribe

[1] Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, "MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4420. Accessed: Jun. 18, 2019.
@article{4420-19,
url = {http://sigport.org/4420},
author = {Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister },
publisher = {IEEE SigPort},
title = {MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION
AU - Shiva Sundaram; Nikko Strom; Bjorn Hoffmeister
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4420
ER -
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/4420
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister, 2019. MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION. Available at: http://sigport.org/4420.
Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. (2019). "MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION." Web.
1. Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister. MULTI-GEOMETRY SPATIAL ACOUSTIC MODELING FOR DISTANT SPEECH RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4420