Documents
Poster
Using recurrences in time and frequency within U-net architecture for speech enhancement
- Citation Author(s):
- Submitted by:
- Tomasz Grzywalski
- Last updated:
- 8 May 2019 - 9:13am
- Document Type:
- Poster
- Document Year:
- 2019
- Event:
- Presenters:
- Tomasz Grzywalski
- Paper Code:
- 3235
- Categories:
- Log in to post comments
When designing fully-convolutional neural network, there is a trade-off between receptive field size, number of parameters and spatial resolution of features in deeper layers of the network. In this work we present a novel network design based on combination of many convolutional and recurrent layers that solves these dilemmas. We compare our solution with U-nets based models known from the literature and other baseline models on speech enhancement task. We test our solution on TIMIT speech utterances combined with noise segments extracted from NOISEX-92 database and show clear advantage of proposed solution in terms of SDR (signal-to-distortion ratio), SIR (signal-to-interference ratio) and STOI (spectro-temporal objective intelligibility) metrics compared to the current state-of-the-art.