Sorry, you need to enable JavaScript to visit this website.

Source Separation and Signal Enhancement

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network


Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by fine-tuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean.

Paper Details

Authors:
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn
Submitted On:
19 April 2018 - 4:40pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

language-noise-transfer.pdf

(28 downloads)

Keywords

Subscribe

[1] Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn, "Language and Noise Transfer in Speech Enhancement Generative Adversarial Network", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3023. Accessed: Jun. 18, 2018.
@article{3023-18,
url = {http://sigport.org/3023},
author = {Maruchan Park; Joan Serrà; Antonio Bonafonte; Kang-Hun Ahn },
publisher = {IEEE SigPort},
title = {Language and Noise Transfer in Speech Enhancement Generative Adversarial Network},
year = {2018} }
TY - EJOUR
T1 - Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
AU - Maruchan Park; Joan Serrà; Antonio Bonafonte; Kang-Hun Ahn
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3023
ER -
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. (2018). Language and Noise Transfer in Speech Enhancement Generative Adversarial Network. IEEE SigPort. http://sigport.org/3023
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn, 2018. Language and Noise Transfer in Speech Enhancement Generative Adversarial Network. Available at: http://sigport.org/3023.
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. (2018). "Language and Noise Transfer in Speech Enhancement Generative Adversarial Network." Web.
1. Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. Language and Noise Transfer in Speech Enhancement Generative Adversarial Network [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3023

ESTIMATION OF THE SOUND FIELD AT ARBITRARY POSITIONS IN DISTRIBUTED MICROPHONE NETWORKS BASED ON DISTRIBUTED RAY SPACE TRANSFORM

Paper Details

Authors:
Mirco Pezzoli, Federico Borra, Fabio Antonacci
Submitted On:
19 April 2018 - 3:21pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Paper #3031.pdf

(31 downloads)

Keywords

Subscribe

[1] Mirco Pezzoli, Federico Borra, Fabio Antonacci, "ESTIMATION OF THE SOUND FIELD AT ARBITRARY POSITIONS IN DISTRIBUTED MICROPHONE NETWORKS BASED ON DISTRIBUTED RAY SPACE TRANSFORM", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3009. Accessed: Jun. 18, 2018.
@article{3009-18,
url = {http://sigport.org/3009},
author = {Mirco Pezzoli; Federico Borra; Fabio Antonacci },
publisher = {IEEE SigPort},
title = {ESTIMATION OF THE SOUND FIELD AT ARBITRARY POSITIONS IN DISTRIBUTED MICROPHONE NETWORKS BASED ON DISTRIBUTED RAY SPACE TRANSFORM},
year = {2018} }
TY - EJOUR
T1 - ESTIMATION OF THE SOUND FIELD AT ARBITRARY POSITIONS IN DISTRIBUTED MICROPHONE NETWORKS BASED ON DISTRIBUTED RAY SPACE TRANSFORM
AU - Mirco Pezzoli; Federico Borra; Fabio Antonacci
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3009
ER -
Mirco Pezzoli, Federico Borra, Fabio Antonacci. (2018). ESTIMATION OF THE SOUND FIELD AT ARBITRARY POSITIONS IN DISTRIBUTED MICROPHONE NETWORKS BASED ON DISTRIBUTED RAY SPACE TRANSFORM. IEEE SigPort. http://sigport.org/3009
Mirco Pezzoli, Federico Borra, Fabio Antonacci, 2018. ESTIMATION OF THE SOUND FIELD AT ARBITRARY POSITIONS IN DISTRIBUTED MICROPHONE NETWORKS BASED ON DISTRIBUTED RAY SPACE TRANSFORM. Available at: http://sigport.org/3009.
Mirco Pezzoli, Federico Borra, Fabio Antonacci. (2018). "ESTIMATION OF THE SOUND FIELD AT ARBITRARY POSITIONS IN DISTRIBUTED MICROPHONE NETWORKS BASED ON DISTRIBUTED RAY SPACE TRANSFORM." Web.
1. Mirco Pezzoli, Federico Borra, Fabio Antonacci. ESTIMATION OF THE SOUND FIELD AT ARBITRARY POSITIONS IN DISTRIBUTED MICROPHONE NETWORKS BASED ON DISTRIBUTED RAY SPACE TRANSFORM [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3009

Speech Enhancement with Convolutional-Recurrent Networks


We propose an end-to-end model based on convolutional and recurrent neural networks for speech enhancement. Our model is purely data-driven and does not make any assumptions about the type or the stationarity of the noise. In contrast to existing methods that use multilayer perceptrons (MLPs), we employ both convolutional and recurrent neural network architectures. Thus, our approach allows us to exploit local structures in both the frequency and temporal domains.

Paper Details

Authors:
Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee
Submitted On:
19 April 2018 - 2:28pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp2018-3744

(52 downloads)

Keywords

Additional Categories

Subscribe

[1] Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee, "Speech Enhancement with Convolutional-Recurrent Networks", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2998. Accessed: Jun. 18, 2018.
@article{2998-18,
url = {http://sigport.org/2998},
author = {Han Zhao; Shuayb Zarar; Ivan Tashev; Chin-Hui Lee },
publisher = {IEEE SigPort},
title = {Speech Enhancement with Convolutional-Recurrent Networks},
year = {2018} }
TY - EJOUR
T1 - Speech Enhancement with Convolutional-Recurrent Networks
AU - Han Zhao; Shuayb Zarar; Ivan Tashev; Chin-Hui Lee
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2998
ER -
Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee. (2018). Speech Enhancement with Convolutional-Recurrent Networks. IEEE SigPort. http://sigport.org/2998
Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee, 2018. Speech Enhancement with Convolutional-Recurrent Networks. Available at: http://sigport.org/2998.
Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee. (2018). "Speech Enhancement with Convolutional-Recurrent Networks." Web.
1. Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee. Speech Enhancement with Convolutional-Recurrent Networks [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2998

A Dynamic Latent Variable Model for Source Separation


We propose a novel latent variable model for learning latent bases for time-varying non-negative data. Our model uses a mixture multinomial as the likelihood function and proposes a Dirichlet distribution with dynamic parameters as a prior, which we call the dynamic Dirichlet prior. An expectation maximization (EM) algorithm is developed for estimating the parameters of the proposed model.

ICASSP_18.pdf

PDF icon ICASSP_18.pdf (26 downloads)

Paper Details

Authors:
Anurendra Kumar, Tanaya Guha, Prasanta Ghosh
Submitted On:
19 April 2018 - 2:17pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_18.pdf

(26 downloads)

Keywords

Subscribe

[1] Anurendra Kumar, Tanaya Guha, Prasanta Ghosh, "A Dynamic Latent Variable Model for Source Separation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2990. Accessed: Jun. 18, 2018.
@article{2990-18,
url = {http://sigport.org/2990},
author = {Anurendra Kumar; Tanaya Guha; Prasanta Ghosh },
publisher = {IEEE SigPort},
title = {A Dynamic Latent Variable Model for Source Separation},
year = {2018} }
TY - EJOUR
T1 - A Dynamic Latent Variable Model for Source Separation
AU - Anurendra Kumar; Tanaya Guha; Prasanta Ghosh
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2990
ER -
Anurendra Kumar, Tanaya Guha, Prasanta Ghosh. (2018). A Dynamic Latent Variable Model for Source Separation. IEEE SigPort. http://sigport.org/2990
Anurendra Kumar, Tanaya Guha, Prasanta Ghosh, 2018. A Dynamic Latent Variable Model for Source Separation. Available at: http://sigport.org/2990.
Anurendra Kumar, Tanaya Guha, Prasanta Ghosh. (2018). "A Dynamic Latent Variable Model for Source Separation." Web.
1. Anurendra Kumar, Tanaya Guha, Prasanta Ghosh. A Dynamic Latent Variable Model for Source Separation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2990

TasNet: time-domain audio separation network for real-time, single-channel speech separation


Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation.

Paper Details

Authors:
Yi Luo, Nima Mesgarani
Submitted On:
19 April 2018 - 2:11pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2018-poster.pdf

(38 downloads)

Keywords

Subscribe

[1] Yi Luo, Nima Mesgarani, "TasNet: time-domain audio separation network for real-time, single-channel speech separation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2987. Accessed: Jun. 18, 2018.
@article{2987-18,
url = {http://sigport.org/2987},
author = {Yi Luo; Nima Mesgarani },
publisher = {IEEE SigPort},
title = {TasNet: time-domain audio separation network for real-time, single-channel speech separation},
year = {2018} }
TY - EJOUR
T1 - TasNet: time-domain audio separation network for real-time, single-channel speech separation
AU - Yi Luo; Nima Mesgarani
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2987
ER -
Yi Luo, Nima Mesgarani. (2018). TasNet: time-domain audio separation network for real-time, single-channel speech separation. IEEE SigPort. http://sigport.org/2987
Yi Luo, Nima Mesgarani, 2018. TasNet: time-domain audio separation network for real-time, single-channel speech separation. Available at: http://sigport.org/2987.
Yi Luo, Nima Mesgarani. (2018). "TasNet: time-domain audio separation network for real-time, single-channel speech separation." Web.
1. Yi Luo, Nima Mesgarani. TasNet: time-domain audio separation network for real-time, single-channel speech separation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2987

ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION

Paper Details

Authors:
Qingju Liu, Yong Xu, Philip Coleman, Philip Jackson, Wenwu Wang
Submitted On:
19 April 2018 - 1:11pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Audio examples

(27 downloads)

Keywords

Subscribe

[1] Qingju Liu, Yong Xu, Philip Coleman, Philip Jackson, Wenwu Wang, "ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2984. Accessed: Jun. 18, 2018.
@article{2984-18,
url = {http://sigport.org/2984},
author = {Qingju Liu; Yong Xu; Philip Coleman; Philip Jackson; Wenwu Wang },
publisher = {IEEE SigPort},
title = {ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION},
year = {2018} }
TY - EJOUR
T1 - ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION
AU - Qingju Liu; Yong Xu; Philip Coleman; Philip Jackson; Wenwu Wang
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2984
ER -
Qingju Liu, Yong Xu, Philip Coleman, Philip Jackson, Wenwu Wang. (2018). ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION. IEEE SigPort. http://sigport.org/2984
Qingju Liu, Yong Xu, Philip Coleman, Philip Jackson, Wenwu Wang, 2018. ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION. Available at: http://sigport.org/2984.
Qingju Liu, Yong Xu, Philip Coleman, Philip Jackson, Wenwu Wang. (2018). "ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION." Web.
1. Qingju Liu, Yong Xu, Philip Coleman, Philip Jackson, Wenwu Wang. ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2984

BSS EVAL OR PEASS? PREDICTING THE PERCEPTION OF SINGING-VOICE SEPARATION


There is some uncertainty as to whether objective metrics for predicting the perceived quality of audio source separation are sufficiently accurate. This issue was investigated by employing a revised experimental methodology to collect subjective ratings of sound quality and interference of singing-voice recordings that have been extracted from musical mixtures using state-of-the-art audio source separation. A correlation analysis between the experimental data and the measures of two objective evaluation toolkits, BSS Eval and PEASS, was performed to assess their performance.

Paper Details

Authors:
Hagen Wierstorf, Russell D. Mason, Emad M. Grais, Mark D. Plumbley
Submitted On:
19 April 2018 - 10:17am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp18_poster_ward_et_al.pdf

(26 downloads)

Keywords

Additional Categories

Subscribe

[1] Hagen Wierstorf, Russell D. Mason, Emad M. Grais, Mark D. Plumbley, "BSS EVAL OR PEASS? PREDICTING THE PERCEPTION OF SINGING-VOICE SEPARATION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2981. Accessed: Jun. 18, 2018.
@article{2981-18,
url = {http://sigport.org/2981},
author = { Hagen Wierstorf; Russell D. Mason; Emad M. Grais; Mark D. Plumbley },
publisher = {IEEE SigPort},
title = {BSS EVAL OR PEASS? PREDICTING THE PERCEPTION OF SINGING-VOICE SEPARATION},
year = {2018} }
TY - EJOUR
T1 - BSS EVAL OR PEASS? PREDICTING THE PERCEPTION OF SINGING-VOICE SEPARATION
AU - Hagen Wierstorf; Russell D. Mason; Emad M. Grais; Mark D. Plumbley
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2981
ER -
Hagen Wierstorf, Russell D. Mason, Emad M. Grais, Mark D. Plumbley. (2018). BSS EVAL OR PEASS? PREDICTING THE PERCEPTION OF SINGING-VOICE SEPARATION. IEEE SigPort. http://sigport.org/2981
Hagen Wierstorf, Russell D. Mason, Emad M. Grais, Mark D. Plumbley, 2018. BSS EVAL OR PEASS? PREDICTING THE PERCEPTION OF SINGING-VOICE SEPARATION. Available at: http://sigport.org/2981.
Hagen Wierstorf, Russell D. Mason, Emad M. Grais, Mark D. Plumbley. (2018). "BSS EVAL OR PEASS? PREDICTING THE PERCEPTION OF SINGING-VOICE SEPARATION." Web.
1. Hagen Wierstorf, Russell D. Mason, Emad M. Grais, Mark D. Plumbley. BSS EVAL OR PEASS? PREDICTING THE PERCEPTION OF SINGING-VOICE SEPARATION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2981

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

Paper Details

Authors:
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn
Submitted On:
19 April 2018 - 4:40pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

language-noise-transfer.pdf

(23 downloads)

Keywords

Subscribe

[1] Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn, "Language and Noise Transfer in Speech Enhancement Generative Adversarial Network", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2979. Accessed: Jun. 18, 2018.
@article{2979-18,
url = {http://sigport.org/2979},
author = {Maruchan Park; Joan Serrà; Antonio Bonafonte; Kang-Hun Ahn },
publisher = {IEEE SigPort},
title = {Language and Noise Transfer in Speech Enhancement Generative Adversarial Network},
year = {2018} }
TY - EJOUR
T1 - Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
AU - Maruchan Park; Joan Serrà; Antonio Bonafonte; Kang-Hun Ahn
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2979
ER -
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. (2018). Language and Noise Transfer in Speech Enhancement Generative Adversarial Network. IEEE SigPort. http://sigport.org/2979
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn, 2018. Language and Noise Transfer in Speech Enhancement Generative Adversarial Network. Available at: http://sigport.org/2979.
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. (2018). "Language and Noise Transfer in Speech Enhancement Generative Adversarial Network." Web.
1. Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. Language and Noise Transfer in Speech Enhancement Generative Adversarial Network [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2979

Correlated Tensor Factorization for Audio Source Separation


This paper presents an ultimate extension of nonnegative matrix factorization (NMF) for audio source separation based on full covariance modeling over all the time-frequency (TF) bins of the complex spectrogram of an observed mixture signal. Although NMF has been widely used for decomposing an observed power spectrogram in a TF-wise manner, it has a critical limitation that the phase values of interdependent TF bins cannot be dealt with.

Paper Details

Authors:
Kazuyoshi Yoshii
Submitted On:
17 April 2018 - 1:39am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp-2018-yoshii-poster.pdf

(33 downloads)

Keywords

Subscribe

[1] Kazuyoshi Yoshii, "Correlated Tensor Factorization for Audio Source Separation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2929. Accessed: Jun. 18, 2018.
@article{2929-18,
url = {http://sigport.org/2929},
author = {Kazuyoshi Yoshii },
publisher = {IEEE SigPort},
title = {Correlated Tensor Factorization for Audio Source Separation},
year = {2018} }
TY - EJOUR
T1 - Correlated Tensor Factorization for Audio Source Separation
AU - Kazuyoshi Yoshii
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2929
ER -
Kazuyoshi Yoshii. (2018). Correlated Tensor Factorization for Audio Source Separation. IEEE SigPort. http://sigport.org/2929
Kazuyoshi Yoshii, 2018. Correlated Tensor Factorization for Audio Source Separation. Available at: http://sigport.org/2929.
Kazuyoshi Yoshii. (2018). "Correlated Tensor Factorization for Audio Source Separation." Web.
1. Kazuyoshi Yoshii. Correlated Tensor Factorization for Audio Source Separation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2929

Deep Learning Based Speech Beamforming


Multi-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified or the inference would otherwise be too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels.

Paper Details

Authors:
Submitted On:
15 April 2018 - 3:56am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

DeepBeam

(35 downloads)

Keywords

Subscribe

[1] , "Deep Learning Based Speech Beamforming", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2886. Accessed: Jun. 18, 2018.
@article{2886-18,
url = {http://sigport.org/2886},
author = { },
publisher = {IEEE SigPort},
title = {Deep Learning Based Speech Beamforming},
year = {2018} }
TY - EJOUR
T1 - Deep Learning Based Speech Beamforming
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2886
ER -
. (2018). Deep Learning Based Speech Beamforming. IEEE SigPort. http://sigport.org/2886
, 2018. Deep Learning Based Speech Beamforming. Available at: http://sigport.org/2886.
. (2018). "Deep Learning Based Speech Beamforming." Web.
1. . Deep Learning Based Speech Beamforming [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2886

Pages