ICASSP 2019 presentation slides

We propose a complex-valued deep neural network (cDNN) for speech enhancement and source separation. While existing end-to-end systems use complex-valued gradients to pass the training error to a real-valued DNN used for gain mask estimation, we use the full potential of complex-valued LSTMs, MLPs and activation functions to estimate complex-valued beamforming weights directly from complex-valued microphone array data. By doing so, our cDNN is able to locate and track different moving sources by exploiting the phase information in the data. In our experiments, we use a typical living room environment, mixtures of the WallStreet Journal corpus, and YouTube noise. We compare our cDNN against the BeamformIt toolkit as a baseline, and a mask-based beamformer as a state-of-the-art reference system. We observed a significant improvement in terms of PESQ, STOI and WER.

icassp2019talk.pdf

icassp2019talk.pdf (483)

Links:

Deep Complex-valued Neural Beamformers

Thumbs Up

CITE

Documents

Presentation Slides

ICASSP 2019 presentation slides

icassp2019talk.pdf

QUESTIONS?