Sorry, you need to enable JavaScript to visit this website.

Neural Network based Spectral Mask Estimation for Acoustic Beamforming

Citation Author(s):
Reinhold Haeb-Umbach
Submitted by:
Jahn Heymann
Last updated:
20 March 2016 - 5:37am
Document Type:
Presentation Slides
Document Year:
Jahn Heymann


We present a neural network based approach to acoustic beamform- ing. The network is used to estimate spectral masks from which the Cross-Power Spectral Density matrices of speech and noise are estimated, which in turn are used to compute the beamformer co- efficients. The network training is independent of the number and the geometric configuration of the microphones. We further show that it is possible to train the network on clean speech only, avoid- ing the need for stereo data with separated speech and noise. Two types of networks are evaluated. One small feed-forward network with only one hidden layer and one more elaborated bi-directional Long Short-Term Memory network. We compare our system with different parametric approaches to mask estimation and using dif- ferent beamforming algorithms. We show that our system yields superior results, both in terms of perceptual speech quality and with respect to speech recognition error rate. The results for the simple feed-forward network are especially encouraging considering its low computational requirements.

0 users have voted: