Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

TasNet: time-domain audio separation network for real-time, single-channel speech separation

Abstract: 

Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation. In addition, time-frequency decomposition results in inherent problems such as phase/magnitude decoupling and long time window which is required to achieve sufficient frequency resolution. We propose Time-domain Audio Separation Network (TasNet) to overcome these limitations. We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs. This method removes the frequency decomposition step and reduces the separation problem to estimation of source masks on encoder outputs which is then synthesized by the decoder. Our system outperforms the current state-of-the-art causal and noncausal speech separation algorithms, reduces the computational cost of speech separation, and significantly reduces the minimum required latency of the output. This makes TasNet suitable for applications where low-power, real-time implementation is desirable such as in hearable and telecommunication devices.

up
0 users have voted:

Paper Details

Authors:
Yi Luo, Nima Mesgarani
Submitted On:
19 April 2018 - 2:11pm
Short Link:
Type:
Poster
Event:
Presenter's Name:
Yi Luo
Paper Code:
AASP-P11.4
Document Year:
2018
Cite

Document Files

ICASSP2018-poster.pdf

(210)

Subscribe

[1] Yi Luo, Nima Mesgarani, "TasNet: time-domain audio separation network for real-time, single-channel speech separation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2987. Accessed: Apr. 25, 2019.
@article{2987-18,
url = {http://sigport.org/2987},
author = {Yi Luo; Nima Mesgarani },
publisher = {IEEE SigPort},
title = {TasNet: time-domain audio separation network for real-time, single-channel speech separation},
year = {2018} }
TY - EJOUR
T1 - TasNet: time-domain audio separation network for real-time, single-channel speech separation
AU - Yi Luo; Nima Mesgarani
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2987
ER -
Yi Luo, Nima Mesgarani. (2018). TasNet: time-domain audio separation network for real-time, single-channel speech separation. IEEE SigPort. http://sigport.org/2987
Yi Luo, Nima Mesgarani, 2018. TasNet: time-domain audio separation network for real-time, single-channel speech separation. Available at: http://sigport.org/2987.
Yi Luo, Nima Mesgarani. (2018). "TasNet: time-domain audio separation network for real-time, single-channel speech separation." Web.
1. Yi Luo, Nima Mesgarani. TasNet: time-domain audio separation network for real-time, single-channel speech separation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2987