Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement

Deep-learning based speech enhancement systems have offered tremendous gains, where the best performing approaches use long short-term memory (LSTM) recurrent neural networks (RNNs) to model temporal speech correlations. These models, however, do not consider the frequency-level correlations within a single time frame, as spectral dependencies along the frequency axis are often ignored. This results in inaccurate frequency responses that negatively affect perceptual quality and intelligibility. We propose a deep-learning approach that considers temporal and frequency-level dependencies. More specifically, we enforce spectral-level dependencies within each spectral time frame through the introduction of a recurrent output layer that models the Markovian assumption along the frequency axis. We evaluate our approach in a variety of speech and noise environments, and objectively show that this recurrent spectral layer offers performance gains over traditional approaches. We also show that our approach outperforms recent approaches that consider frequency-level dependencies.

intraspectral2019mlsp_poster_v3.pdf

Intra-Spectra Recurrent Output Layer (394)

Thumbs Up

CITE

Documents

Poster

Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement

intraspectral2019mlsp_poster_v3.pdf

QUESTIONS?