MULTICHANNEL SPEECH SEPARATION WITH RECURRENT NEURAL NETWORKS FROM HIGH-ORDER AMBISONICS RECORDINGS

Citation Author(s):: Laureline Perotin
romain serizel

Emmanuel Vincent, Alexandre Guérin
Submitted by:: Laureline Perotin
Last updated:: 19 April 2018 - 5:18pm
Document Type:: Presentation Slides
Document Year:: 2018
Event:: ICASSP 2018
Presenters:: Laureline Perotin
Paper Code:: AASP-L2.2

Categories:: Spatial and Multichannel Audio

We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the LSTM, assuming that we know the directions of arrival of the directional sources. In our experiments, the speech of interest can be corrupted either by diffuse noise or by an equally loud competing speaker. We show that adding as input the output of the beamformer steered toward the competing speech in addition to that of the beamformer steered toward the target speech brings significant improvements in terms of word error rate.

perotin.pdf

perotin.pdf (519)

Thumbs Up

CITE

Documents

Presentation Slides

MULTICHANNEL SPEECH SEPARATION WITH RECURRENT NEURAL NETWORKS FROM HIGH-ORDER AMBISONICS RECORDINGS

perotin.pdf

QUESTIONS?