Neural Network based Spectral Mask Estimation for Acoustic Beamforming

We present a neural network based approach to acoustic beamform- ing. The network is used to estimate spectral masks from which the Cross-Power Spectral Density matrices of speech and noise are estimated, which in turn are used to compute the beamformer co- efficients. The network training is independent of the number and the geometric configuration of the microphones. We further show that it is possible to train the network on clean speech only, avoid- ing the need for stereo data with separated speech and noise. Two types of networks are evaluated. One small feed-forward network with only one hidden layer and one more elaborated bi-directional Long Short-Term Memory network. We compare our system with different parametric approaches to mask estimation and using dif- ferent beamforming algorithms. We show that our system yields superior results, both in terms of perceptual speech quality and with respect to speech recognition error rate. The results for the simple feed-forward network are especially encouraging considering its low computational requirements.

icassp_2016.pdf

icassp_2016.pdf (900)

Thumbs Up

CITE

Documents

Presentation Slides

Neural Network based Spectral Mask Estimation for Acoustic Beamforming

icassp_2016.pdf

QUESTIONS?