Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Low-latency deep clustering for speech separation

Abstract: 

This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation. It consists of three parts: a) the usage of long-short-term-memory (LSTM) networks instead of their bidirectional variant used in the original work, b) using a short synthesis window (here 8 ms) required for low-latency operation, and, c) using a buffer in the beginning of audio mixture to estimate cluster centres corresponding to constituent speakers which are then utilized to separate speakers within the rest of the signal. The buffer duration would serve as an initialization phase after which the system is capable of operating with 8 ms algorithmic latency. We evaluate our proposed approach on two-speaker mixtures from the Wall Street Journal (WSJ0) corpus. We observe that the use of LSTM yields around one dB lower SDR as compared to the baseline bidirectional LSTM in terms of source to distortion ratio (SDR). Moreover, using an 8 ms synthesis window instead of 32 ms degrades the separation performance by around 2.1 dB as compared to the baseline. Finally, we also report separation performance with different buffer durations noting that separation can be achieved even for buffer duration as low as 300ms.

up
1 user has voted: Shanshan Wang

Paper Details

Authors:
Shanshan Wang, Gaurav Naithani, Tuomas Virtanen
Submitted On:
7 May 2019 - 1:32pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Shanshan Wang
Paper Code:
4400
Document Year:
2019
Cite

Document Files

ICASSP_presentation_updated.pdf

(34)

Subscribe

[1] Shanshan Wang, Gaurav Naithani, Tuomas Virtanen, "Low-latency deep clustering for speech separation", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3908. Accessed: Sep. 21, 2019.
@article{3908-19,
url = {http://sigport.org/3908},
author = {Shanshan Wang; Gaurav Naithani; Tuomas Virtanen },
publisher = {IEEE SigPort},
title = {Low-latency deep clustering for speech separation},
year = {2019} }
TY - EJOUR
T1 - Low-latency deep clustering for speech separation
AU - Shanshan Wang; Gaurav Naithani; Tuomas Virtanen
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3908
ER -
Shanshan Wang, Gaurav Naithani, Tuomas Virtanen. (2019). Low-latency deep clustering for speech separation. IEEE SigPort. http://sigport.org/3908
Shanshan Wang, Gaurav Naithani, Tuomas Virtanen, 2019. Low-latency deep clustering for speech separation. Available at: http://sigport.org/3908.
Shanshan Wang, Gaurav Naithani, Tuomas Virtanen. (2019). "Low-latency deep clustering for speech separation." Web.
1. Shanshan Wang, Gaurav Naithani, Tuomas Virtanen. Low-latency deep clustering for speech separation [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3908