Sorry, you need to enable JavaScript to visit this website.

LOW-LATENCY SOUND SOURCE SEPARATION USING DEEP NEURAL NETWORKS

Citation Author(s):
Tom Barker, Niels Henrik Pontoppidan, Tuomas Virtanen
Submitted by:
Gaurav Naithani
Last updated:
8 December 2016 - 3:27pm
Document Type:
Poster
Document Year:
2016
Event:
Presenters:
Gaurav Naithani
 

Sound source separation at low-latency requires that each in- coming frame of audio data be processed at very low de- lay, and outputted as soon as possible. For practical pur- poses involving human listeners, a 20 ms algorithmic delay is the uppermost limit which is comfortable to the listener. In this paper, we propose a low-latency (algorithmic delay ≤ 20 ms) deep neural network (DNN) based source sepa- ration method. The proposed method takes advantage of an extended past context, outputting soft time-frequency mask- ing filters which are then applied to incoming audio frames to give better separation performance as compared to NMF baseline. Acoustic mixtures from five pairs of speakers from CMU Arctic database were used for the experiments. At least 1 dB average improvement in source to distortion ratios (SDR) was observed in our DNN-based system over a low- latency NMF baseline for different processing and analysis frame lengths. The effect of incorporating previous temporal context into DNN inputs yielded significant improvements in SDR for short processing frame lengths.

up
0 users have voted: