Sorry, you need to enable JavaScript to visit this website.

Source Separation and Signal Enhancement

SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION


In this paper, we present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference information. The RMS-PDCW algorithm selects masks to apply using the information about the localized sound source and the onset detection of speech.

Paper Details

Authors:
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern
Submitted On:
7 May 2018 - 12:38am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

icassp_4465_poster.pdf

(22 downloads)

Keywords

Subscribe

[1] Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern, "SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3203. Accessed: May. 23, 2018.
@article{3203-18,
url = {http://sigport.org/3203},
author = {Chanwoo Kim; Anjali Menon; Michiel Bacchiani ; Richard Stern },
publisher = {IEEE SigPort},
title = {SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION},
year = {2018} }
TY - EJOUR
T1 - SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION
AU - Chanwoo Kim; Anjali Menon; Michiel Bacchiani ; Richard Stern
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3203
ER -
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. (2018). SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION. IEEE SigPort. http://sigport.org/3203
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern, 2018. SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION. Available at: http://sigport.org/3203.
Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. (2018). "SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION." Web.
1. Chanwoo Kim, Anjali Menon, Michiel Bacchiani , Richard Stern. SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3203

Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization


This paper proposes an extension of multichannel non-negative matrix factorization (MNMF) that simultaneously solves source separation and dereverberation. While MNMF was originally formulated under an underdetermined problem setting where sources outnumber microphones, a determined counterpart of MNMF, which we call the determined MNMF (DMNMF), has recently been proposed with notable success.

Paper Details

Authors:
Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa
Submitted On:
23 April 2018 - 5:00pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Kagami2018ICASSP03.pdf

(19 downloads)

Keywords

Subscribe

[1] Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa, "Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3153. Accessed: May. 23, 2018.
@article{3153-18,
url = {http://sigport.org/3153},
author = {Hideaki Kagami; Hirokazu Kameoka; Masahiro Yukawa },
publisher = {IEEE SigPort},
title = {Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization},
year = {2018} }
TY - EJOUR
T1 - Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization
AU - Hideaki Kagami; Hirokazu Kameoka; Masahiro Yukawa
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3153
ER -
Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa. (2018). Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization. IEEE SigPort. http://sigport.org/3153
Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa, 2018. Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization. Available at: http://sigport.org/3153.
Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa. (2018). "Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization." Web.
1. Hideaki Kagami, Hirokazu Kameoka, Masahiro Yukawa. Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3153

CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING


In this paper, we propose a speaker-independent multi-speaker monaural speech separation system (CBLDNN-GAT) based on convolutional, bidirectional long short-term memory, deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT). Our system aims at obtaining better speech quality instead of only minimizing a mean square error (MSE). In the initial phase, we utilize log-mel filterbank and pitch features to warm up our CBLDNN in a multi-task manner.

Paper Details

Authors:
Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu
Submitted On:
22 April 2018 - 9:43pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

conference_poster_4.pdf

(26 downloads)

Keywords

Subscribe

[1] Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu, "CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3140. Accessed: May. 23, 2018.
@article{3140-18,
url = {http://sigport.org/3140},
author = {Chenxing Li; Lei Zhu; Shuang Xu; Peng Gao; Bo Xu },
publisher = {IEEE SigPort},
title = {CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING},
year = {2018} }
TY - EJOUR
T1 - CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING
AU - Chenxing Li; Lei Zhu; Shuang Xu; Peng Gao; Bo Xu
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3140
ER -
Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu. (2018). CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING. IEEE SigPort. http://sigport.org/3140
Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu, 2018. CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING. Available at: http://sigport.org/3140.
Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu. (2018). "CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING." Web.
1. Chenxing Li, Lei Zhu, Shuang Xu, Peng Gao, Bo Xu. CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3140

Separake: Source separation with a little help from echoes


It is commonly believed that multipath hurts various audio processing algorithms. At odds with this belief, we show that multipath in fact helps sound source separation, even with very simple propagation models. Unlike most existing methods, we neither ignore the room impulse responses, nor we attempt to estimate them fully. We rather assume to know the positions of a few virtual microphones generated by echoes and we show how this gives us enough spatial diversity to get a performance boost over the anechoic case.

Paper Details

Authors:
Robin Scheibler, Diego Di Carlo, Antoine Deleforge, Ivan Dokmanic
Submitted On:
22 April 2018 - 9:05pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

separake_icassp2018_slides.pdf

(21 downloads)

Keywords

Subscribe

[1] Robin Scheibler, Diego Di Carlo, Antoine Deleforge, Ivan Dokmanic, "Separake: Source separation with a little help from echoes", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3139. Accessed: May. 23, 2018.
@article{3139-18,
url = {http://sigport.org/3139},
author = {Robin Scheibler; Diego Di Carlo; Antoine Deleforge; Ivan Dokmanic },
publisher = {IEEE SigPort},
title = {Separake: Source separation with a little help from echoes},
year = {2018} }
TY - EJOUR
T1 - Separake: Source separation with a little help from echoes
AU - Robin Scheibler; Diego Di Carlo; Antoine Deleforge; Ivan Dokmanic
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3139
ER -
Robin Scheibler, Diego Di Carlo, Antoine Deleforge, Ivan Dokmanic. (2018). Separake: Source separation with a little help from echoes. IEEE SigPort. http://sigport.org/3139
Robin Scheibler, Diego Di Carlo, Antoine Deleforge, Ivan Dokmanic, 2018. Separake: Source separation with a little help from echoes. Available at: http://sigport.org/3139.
Robin Scheibler, Diego Di Carlo, Antoine Deleforge, Ivan Dokmanic. (2018). "Separake: Source separation with a little help from echoes." Web.
1. Robin Scheibler, Diego Di Carlo, Antoine Deleforge, Ivan Dokmanic. Separake: Source separation with a little help from echoes [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3139

ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION


Informed source separation (ISS) uses source separation for extracting audio objects out of their downmix given some pre-computed parameters. In recent years, non-negative tensor factorization (NTF) has proven to be a good choice for compressing audio objects at an encoding stage. At the decoding stage, these parameters are used to separate the downmix with Wiener-filtering. The quantized NTF parameters have to be encoded to a bitstream prior to transmission.

Paper Details

Authors:
Max Bläser, Christian Rohlfing, Yingbo Gao, Mathias Wien
Submitted On:
22 April 2018 - 1:25pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

BlRo2018_Poster_print.pdf

(19 downloads)

Keywords

Subscribe

[1] Max Bläser, Christian Rohlfing, Yingbo Gao, Mathias Wien, "ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3135. Accessed: May. 23, 2018.
@article{3135-18,
url = {http://sigport.org/3135},
author = {Max Bläser; Christian Rohlfing; Yingbo Gao; Mathias Wien },
publisher = {IEEE SigPort},
title = {ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION},
year = {2018} }
TY - EJOUR
T1 - ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION
AU - Max Bläser; Christian Rohlfing; Yingbo Gao; Mathias Wien
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3135
ER -
Max Bläser, Christian Rohlfing, Yingbo Gao, Mathias Wien. (2018). ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION. IEEE SigPort. http://sigport.org/3135
Max Bläser, Christian Rohlfing, Yingbo Gao, Mathias Wien, 2018. ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION. Available at: http://sigport.org/3135.
Max Bläser, Christian Rohlfing, Yingbo Gao, Mathias Wien. (2018). "ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION." Web.
1. Max Bläser, Christian Rohlfing, Yingbo Gao, Mathias Wien. ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3135

Shift-Invariant Kernel Additive Modelling for Audio Source Separation


A major goal in blind source separation to identify and separate sources is to model their inherent characteristics. While most state-of- the-art approaches are supervised methods trained on large datasets, interest in non-data-driven approaches such as Kernel Additive Modelling (KAM) remains high due to their interpretability and adaptability. KAM performs the separation of a given source applying robust statistics on the time-frequency bins selected by a source-specific kernel function, commonly the K-NN function.

Paper Details

Authors:
D. Fano Yela, S. Ewert, K. O'Hanlon, M. Sandler
Submitted On:
21 April 2018 - 10:11pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

dfy_poster.pdf

(18 downloads)

Keywords

Subscribe

[1] D. Fano Yela, S. Ewert, K. O'Hanlon, M. Sandler, "Shift-Invariant Kernel Additive Modelling for Audio Source Separation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3125. Accessed: May. 23, 2018.
@article{3125-18,
url = {http://sigport.org/3125},
author = {D. Fano Yela; S. Ewert; K. O'Hanlon; M. Sandler },
publisher = {IEEE SigPort},
title = {Shift-Invariant Kernel Additive Modelling for Audio Source Separation},
year = {2018} }
TY - EJOUR
T1 - Shift-Invariant Kernel Additive Modelling for Audio Source Separation
AU - D. Fano Yela; S. Ewert; K. O'Hanlon; M. Sandler
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3125
ER -
D. Fano Yela, S. Ewert, K. O'Hanlon, M. Sandler. (2018). Shift-Invariant Kernel Additive Modelling for Audio Source Separation. IEEE SigPort. http://sigport.org/3125
D. Fano Yela, S. Ewert, K. O'Hanlon, M. Sandler, 2018. Shift-Invariant Kernel Additive Modelling for Audio Source Separation. Available at: http://sigport.org/3125.
D. Fano Yela, S. Ewert, K. O'Hanlon, M. Sandler. (2018). "Shift-Invariant Kernel Additive Modelling for Audio Source Separation." Web.
1. D. Fano Yela, S. Ewert, K. O'Hanlon, M. Sandler. Shift-Invariant Kernel Additive Modelling for Audio Source Separation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3125

END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN

Paper Details

Authors:
Submitted On:
20 April 2018 - 10:06am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2018_koizumi_r03.pdf

(54 downloads)

Keywords

Subscribe

[1] , "END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3106. Accessed: May. 23, 2018.
@article{3106-18,
url = {http://sigport.org/3106},
author = { },
publisher = {IEEE SigPort},
title = {END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN},
year = {2018} }
TY - EJOUR
T1 - END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3106
ER -
. (2018). END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN. IEEE SigPort. http://sigport.org/3106
, 2018. END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN. Available at: http://sigport.org/3106.
. (2018). "END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN." Web.
1. . END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3106

ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE

Paper Details

Authors:
Xin Leng, Jingdong Chen, Jacob Benesty, Israel Cohen
Submitted On:
20 April 2018 - 9:23am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2018.pdf

(51 downloads)

Keywords

Subscribe

[1] Xin Leng, Jingdong Chen, Jacob Benesty, Israel Cohen, "ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3104. Accessed: May. 23, 2018.
@article{3104-18,
url = {http://sigport.org/3104},
author = {Xin Leng; Jingdong Chen; Jacob Benesty; Israel Cohen },
publisher = {IEEE SigPort},
title = {ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE},
year = {2018} }
TY - EJOUR
T1 - ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE
AU - Xin Leng; Jingdong Chen; Jacob Benesty; Israel Cohen
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3104
ER -
Xin Leng, Jingdong Chen, Jacob Benesty, Israel Cohen. (2018). ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE. IEEE SigPort. http://sigport.org/3104
Xin Leng, Jingdong Chen, Jacob Benesty, Israel Cohen, 2018. ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE. Available at: http://sigport.org/3104.
Xin Leng, Jingdong Chen, Jacob Benesty, Israel Cohen. (2018). "ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE." Web.
1. Xin Leng, Jingdong Chen, Jacob Benesty, Israel Cohen. ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3104

SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM


Utterance level permutation invariant training (uPIT) tech- nique is a state-of-the-art deep learning architecture for speaker independent multi-talker separation. uPIT solves the label ambiguity problem by minimizing the mean square error (MSE) over all permutations between outputs and tar- gets. However, uPIT may be sub-optimal at segmental level because the optimization is not calculated over the individual frames. In this paper, we propose a constrained uPIT (cu- PIT) to solve this problem by computing a weighted MSE loss using dynamic information (i.e., delta and acceleration).

Paper Details

Authors:
CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI
Submitted On:
20 April 2018 - 12:38am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2018_1844.pdf

(42 downloads)

Keywords

Subscribe

[1] CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI, "SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3068. Accessed: May. 23, 2018.
@article{3068-18,
url = {http://sigport.org/3068},
author = {CHENGLIN XU; WEI RAO; XIONG XIAO; ENG SIONG CHNG; HAIZHOU LI },
publisher = {IEEE SigPort},
title = {SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM},
year = {2018} }
TY - EJOUR
T1 - SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM
AU - CHENGLIN XU; WEI RAO; XIONG XIAO; ENG SIONG CHNG; HAIZHOU LI
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3068
ER -
CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI. (2018). SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM. IEEE SigPort. http://sigport.org/3068
CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI, 2018. SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM. Available at: http://sigport.org/3068.
CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI. (2018). "SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM." Web.
1. CHENGLIN XU, WEI RAO, XIONG XIAO, ENG SIONG CHNG, HAIZHOU LI. SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3068

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network


Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by fine-tuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean.

Paper Details

Authors:
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn
Submitted On:
19 April 2018 - 4:44pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

language-noise-transfer.pdf

(23 downloads)

Keywords

Subscribe

[1] Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn, "Language and Noise Transfer in Speech Enhancement Generative Adversarial Network", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3025. Accessed: May. 23, 2018.
@article{3025-18,
url = {http://sigport.org/3025},
author = {Maruchan Park; Joan Serrà; Antonio Bonafonte; Kang-Hun Ahn },
publisher = {IEEE SigPort},
title = {Language and Noise Transfer in Speech Enhancement Generative Adversarial Network},
year = {2018} }
TY - EJOUR
T1 - Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
AU - Maruchan Park; Joan Serrà; Antonio Bonafonte; Kang-Hun Ahn
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3025
ER -
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. (2018). Language and Noise Transfer in Speech Enhancement Generative Adversarial Network. IEEE SigPort. http://sigport.org/3025
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn, 2018. Language and Noise Transfer in Speech Enhancement Generative Adversarial Network. Available at: http://sigport.org/3025.
Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. (2018). "Language and Noise Transfer in Speech Enhancement Generative Adversarial Network." Web.
1. Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn. Language and Noise Transfer in Speech Enhancement Generative Adversarial Network [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3025

Pages