Sorry, you need to enable JavaScript to visit this website.

Source Separation and Signal Enhancement

Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition


This paper presents a novel 3DoF+ system that allows to navigate, i.e., change position, in scene-based spatial audio content beyond the sweet spot of a Higher Order Ambisonics recording. It is one of the first such systems based on sound capturing at a single spatial position. The system uses a parametric decomposition of the recorded sound field. For the synthesis, only coarse distance information about the sources is needed as side information but not the exact number of them.

Paper Details

Authors:
Andreas Behler, Peter Jax
Submitted On:
20 May 2020 - 10:32am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

handout.pdf

(50)

Subscribe

[1] Andreas Behler, Peter Jax, "Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5414. Accessed: Sep. 29, 2020.
@article{5414-20,
url = {http://sigport.org/5414},
author = {Andreas Behler; Peter Jax },
publisher = {IEEE SigPort},
title = {Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition},
year = {2020} }
TY - EJOUR
T1 - Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition
AU - Andreas Behler; Peter Jax
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5414
ER -
Andreas Behler, Peter Jax. (2020). Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition. IEEE SigPort. http://sigport.org/5414
Andreas Behler, Peter Jax, 2020. Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition. Available at: http://sigport.org/5414.
Andreas Behler, Peter Jax. (2020). "Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition." Web.
1. Andreas Behler, Peter Jax. Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5414

Weighted Speech Distortion Losses for Real-time Speech Enhancement


This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two novel mean-squared-error-based learning objectives that enable separate control over the importance of speech distortion versus noise reduction.

Paper Details

Authors:
Ivan Tashev
Submitted On:
17 May 2020 - 7:34pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

dns-public-v2short.pptx

(64)

Subscribe

[1] Ivan Tashev, "Weighted Speech Distortion Losses for Real-time Speech Enhancement", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5388. Accessed: Sep. 29, 2020.
@article{5388-20,
url = {http://sigport.org/5388},
author = {Ivan Tashev },
publisher = {IEEE SigPort},
title = {Weighted Speech Distortion Losses for Real-time Speech Enhancement},
year = {2020} }
TY - EJOUR
T1 - Weighted Speech Distortion Losses for Real-time Speech Enhancement
AU - Ivan Tashev
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5388
ER -
Ivan Tashev. (2020). Weighted Speech Distortion Losses for Real-time Speech Enhancement. IEEE SigPort. http://sigport.org/5388
Ivan Tashev, 2020. Weighted Speech Distortion Losses for Real-time Speech Enhancement. Available at: http://sigport.org/5388.
Ivan Tashev. (2020). "Weighted Speech Distortion Losses for Real-time Speech Enhancement." Web.
1. Ivan Tashev. Weighted Speech Distortion Losses for Real-time Speech Enhancement [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5388

Generalized Coherence-based Signal Enhancement


This contribution presents a novel approach for coherence-based signal enhancement. An estimator for the coherent-to-diffuse ratio (CDR) is devised, which exploits the concept of generalized magnitude coherence and thus, unlike common state-of-the-art schemes, can simultaneously take advantage of more than two microphones. Moreover, the speech enhancement by CDR-based spectral weighting is not performed as a post-filtering step, but by enhancing the most appropriate microphone signal.

Paper Details

Authors:
Heinrich W. Löllmann, Andreas Brendel, Walter Kellermann
Submitted On:
14 May 2020 - 7:16am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP 2020 Presentation of H. Loellmann

(38)

Subscribe

[1] Heinrich W. Löllmann, Andreas Brendel, Walter Kellermann, "Generalized Coherence-based Signal Enhancement", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5283. Accessed: Sep. 29, 2020.
@article{5283-20,
url = {http://sigport.org/5283},
author = {Heinrich W. Löllmann; Andreas Brendel; Walter Kellermann },
publisher = {IEEE SigPort},
title = {Generalized Coherence-based Signal Enhancement},
year = {2020} }
TY - EJOUR
T1 - Generalized Coherence-based Signal Enhancement
AU - Heinrich W. Löllmann; Andreas Brendel; Walter Kellermann
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5283
ER -
Heinrich W. Löllmann, Andreas Brendel, Walter Kellermann. (2020). Generalized Coherence-based Signal Enhancement. IEEE SigPort. http://sigport.org/5283
Heinrich W. Löllmann, Andreas Brendel, Walter Kellermann, 2020. Generalized Coherence-based Signal Enhancement. Available at: http://sigport.org/5283.
Heinrich W. Löllmann, Andreas Brendel, Walter Kellermann. (2020). "Generalized Coherence-based Signal Enhancement." Web.
1. Heinrich W. Löllmann, Andreas Brendel, Walter Kellermann. Generalized Coherence-based Signal Enhancement [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5283

Spatially Guided Independent Vector Analysis

Paper Details

Authors:
Andreas Brendel, Thomas Haubner, Walter Kellermann
Submitted On:
14 May 2020 - 4:33am
Short Link:
Type:

Document Files

Spatially Guided IVA - Poster

(32)

Subscribe

[1] Andreas Brendel, Thomas Haubner, Walter Kellermann, "Spatially Guided Independent Vector Analysis", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5266. Accessed: Sep. 29, 2020.
@article{5266-20,
url = {http://sigport.org/5266},
author = {Andreas Brendel; Thomas Haubner; Walter Kellermann },
publisher = {IEEE SigPort},
title = {Spatially Guided Independent Vector Analysis},
year = {2020} }
TY - EJOUR
T1 - Spatially Guided Independent Vector Analysis
AU - Andreas Brendel; Thomas Haubner; Walter Kellermann
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5266
ER -
Andreas Brendel, Thomas Haubner, Walter Kellermann. (2020). Spatially Guided Independent Vector Analysis. IEEE SigPort. http://sigport.org/5266
Andreas Brendel, Thomas Haubner, Walter Kellermann, 2020. Spatially Guided Independent Vector Analysis. Available at: http://sigport.org/5266.
Andreas Brendel, Thomas Haubner, Walter Kellermann. (2020). "Spatially Guided Independent Vector Analysis." Web.
1. Andreas Brendel, Thomas Haubner, Walter Kellermann. Spatially Guided Independent Vector Analysis [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5266

Adaptive Blind Audio Source Extraction Supervised by Dominant Speaker Identification using X-vectors


We propose a novel algorithm for adaptive blind audio source extraction. The proposed method is based on independent vector analysis and utilizes the auxiliary function optimization to achieve high convergence speed. The algorithm is partially supervised by a pilot signal related to the source of interest (SOI), which ensures that the method correctly extracts the utterance of the desired speaker. The pilot is based on the identification of a dominant speaker in the mixture using x-vectors. The properties of the x-vectors computed in the presence of cross-talk are experimentally analyzed.

Paper Details

Authors:
Jansky, Malek, Cmejla, Kounovsky, Koldovsky, Zdansky
Submitted On:
14 May 2020 - 3:35am
Short Link:
Type:
Event:
Document Year:
Cite

Document Files

icassp2020_JanskyMalek_paper1967_final.pdf

(44)

Subscribe

[1] Jansky, Malek, Cmejla, Kounovsky, Koldovsky, Zdansky, "Adaptive Blind Audio Source Extraction Supervised by Dominant Speaker Identification using X-vectors", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5253. Accessed: Sep. 29, 2020.
@article{5253-20,
url = {http://sigport.org/5253},
author = {Jansky; Malek; Cmejla; Kounovsky; Koldovsky; Zdansky },
publisher = {IEEE SigPort},
title = {Adaptive Blind Audio Source Extraction Supervised by Dominant Speaker Identification using X-vectors},
year = {2020} }
TY - EJOUR
T1 - Adaptive Blind Audio Source Extraction Supervised by Dominant Speaker Identification using X-vectors
AU - Jansky; Malek; Cmejla; Kounovsky; Koldovsky; Zdansky
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5253
ER -
Jansky, Malek, Cmejla, Kounovsky, Koldovsky, Zdansky. (2020). Adaptive Blind Audio Source Extraction Supervised by Dominant Speaker Identification using X-vectors. IEEE SigPort. http://sigport.org/5253
Jansky, Malek, Cmejla, Kounovsky, Koldovsky, Zdansky, 2020. Adaptive Blind Audio Source Extraction Supervised by Dominant Speaker Identification using X-vectors. Available at: http://sigport.org/5253.
Jansky, Malek, Cmejla, Kounovsky, Koldovsky, Zdansky. (2020). "Adaptive Blind Audio Source Extraction Supervised by Dominant Speaker Identification using X-vectors." Web.
1. Jansky, Malek, Cmejla, Kounovsky, Koldovsky, Zdansky. Adaptive Blind Audio Source Extraction Supervised by Dominant Speaker Identification using X-vectors [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5253

ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING


Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed spatial features are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework. In this architecture, time-domain filters spanning signal channels are trained to perform adaptive spatial filtering.

Paper Details

Authors:
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
Submitted On:
13 May 2020 - 10:45pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2020 paper# 4750 slides

(81)

Subscribe

[1] Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu, "ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5205. Accessed: Sep. 29, 2020.
@article{5205-20,
url = {http://sigport.org/5205},
author = {Rongzhi Gu; Shi-Xiong Zhang; Lianwu Chen; Yong Xu; Meng Yu; Dan Su; Yuexian Zou; Dong Yu },
publisher = {IEEE SigPort},
title = {ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING},
year = {2020} }
TY - EJOUR
T1 - ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING
AU - Rongzhi Gu; Shi-Xiong Zhang; Lianwu Chen; Yong Xu; Meng Yu; Dan Su; Yuexian Zou; Dong Yu
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5205
ER -
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu. (2020). ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING. IEEE SigPort. http://sigport.org/5205
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu, 2020. ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING. Available at: http://sigport.org/5205.
Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu. (2020). "ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING." Web.
1. Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu. ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5205

Mask-dependent Phase Estimation for Monaural Speaker Separation


Speaker Separation refers to isolating speech of interest in a multi-talker environment. Most methods apply real-valued Time-Frequency (T-F) masks to the mixture Short-Time Fourier Transform (STFT) to reconstruct the clean speech. Hence there is an unavoidable mismatch between the phase of the reconstruction and the original phase of the clean speech. In this paper, we propose a simple yet effective phase estimation network that predicts the phase of the clean speech based on a T-F mask predicted by a chimera++ network.

Paper Details

Authors:
Zhaoheng Ni,Michael I Mandel
Submitted On:
13 May 2020 - 9:50pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Slides_Mask-dependent Phase Estimation for Monaural Speaker Separation.pdf

(37)

Subscribe

[1] Zhaoheng Ni,Michael I Mandel, "Mask-dependent Phase Estimation for Monaural Speaker Separation", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5186. Accessed: Sep. 29, 2020.
@article{5186-20,
url = {http://sigport.org/5186},
author = {Zhaoheng Ni;Michael I Mandel },
publisher = {IEEE SigPort},
title = {Mask-dependent Phase Estimation for Monaural Speaker Separation},
year = {2020} }
TY - EJOUR
T1 - Mask-dependent Phase Estimation for Monaural Speaker Separation
AU - Zhaoheng Ni;Michael I Mandel
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5186
ER -
Zhaoheng Ni,Michael I Mandel. (2020). Mask-dependent Phase Estimation for Monaural Speaker Separation. IEEE SigPort. http://sigport.org/5186
Zhaoheng Ni,Michael I Mandel, 2020. Mask-dependent Phase Estimation for Monaural Speaker Separation. Available at: http://sigport.org/5186.
Zhaoheng Ni,Michael I Mandel. (2020). "Mask-dependent Phase Estimation for Monaural Speaker Separation." Web.
1. Zhaoheng Ni,Michael I Mandel. Mask-dependent Phase Estimation for Monaural Speaker Separation [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5186

Two-Step Sound Source Separation: Training on Learned Latent Targets (Presentation)


In this paper, we propose a two-step training procedure for source separation via a deep neural network. In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal. For the second step, we train a separation module that operates on the previously learned space. In order to do so, we also make use of a scale-invariant signal to distortion ratio (SI-SDR) loss function that works in the latent space, and we prove that it lower-bounds the SI-SDR in the time domain.

Paper Details

Authors:
Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis
Submitted On:
20 April 2020 - 7:15pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

etzinis_icassp2020_twostep_slides.pdf

(82)

Subscribe

[1] Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis, "Two-Step Sound Source Separation: Training on Learned Latent Targets (Presentation)", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5112. Accessed: Sep. 29, 2020.
@article{5112-20,
url = {http://sigport.org/5112},
author = {Efthymios Tzinis; Shrikant Venkataramani; Zhepei Wang; Cem Subakan; Paris Smaragdis },
publisher = {IEEE SigPort},
title = {Two-Step Sound Source Separation: Training on Learned Latent Targets (Presentation)},
year = {2020} }
TY - EJOUR
T1 - Two-Step Sound Source Separation: Training on Learned Latent Targets (Presentation)
AU - Efthymios Tzinis; Shrikant Venkataramani; Zhepei Wang; Cem Subakan; Paris Smaragdis
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5112
ER -
Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis. (2020). Two-Step Sound Source Separation: Training on Learned Latent Targets (Presentation). IEEE SigPort. http://sigport.org/5112
Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis, 2020. Two-Step Sound Source Separation: Training on Learned Latent Targets (Presentation). Available at: http://sigport.org/5112.
Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis. (2020). "Two-Step Sound Source Separation: Training on Learned Latent Targets (Presentation)." Web.
1. Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis. Two-Step Sound Source Separation: Training on Learned Latent Targets (Presentation) [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5112

Improving Universal Sound Separation Using Sound Classification Presentation


Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic sources from an open domain, regardless of their class.

Paper Details

Authors:
Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis
Submitted On:
3 May 2020 - 10:09pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

etzinis_improving_icassp2020_slides.pdf

(90)

Subscribe

[1] Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis, "Improving Universal Sound Separation Using Sound Classification Presentation", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5111. Accessed: Sep. 29, 2020.
@article{5111-20,
url = {http://sigport.org/5111},
author = {Efthymios Tzinis; Scott Wisdom; John R. Hershey; Aren Jansen; Daniel P. W. Ellis },
publisher = {IEEE SigPort},
title = {Improving Universal Sound Separation Using Sound Classification Presentation},
year = {2020} }
TY - EJOUR
T1 - Improving Universal Sound Separation Using Sound Classification Presentation
AU - Efthymios Tzinis; Scott Wisdom; John R. Hershey; Aren Jansen; Daniel P. W. Ellis
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5111
ER -
Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis. (2020). Improving Universal Sound Separation Using Sound Classification Presentation. IEEE SigPort. http://sigport.org/5111
Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis, 2020. Improving Universal Sound Separation Using Sound Classification Presentation. Available at: http://sigport.org/5111.
Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis. (2020). "Improving Universal Sound Separation Using Sound Classification Presentation." Web.
1. Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis. Improving Universal Sound Separation Using Sound Classification Presentation [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5111

PEVD-based Speech Enhancement in Reverberant Environments


The enhancement of noisy speech is important for applications involving human-to-human interactions, such as telecommunications and hearing aids, as well as human-to-machine interactions, such as voice-controlled systems and robot audition. In this work, we focus on reverberant environments. It is shown that, by exploiting the lack of correlation between speech and the late reflections, further noise reduction can be achieved. This is verified using simulations involving actual acoustic impulse responses and noise from the ACE corpus.

Paper Details

Authors:
Vincent W. Neo, Christine Evers, Patrick A. Naylor
Submitted On:
18 April 2020 - 12:18pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

[ICASSP2020]_PEVD_based_Speech_Enhancement_in_Reverberant_Environments_Handout.pdf

(57)

Subscribe

[1] Vincent W. Neo, Christine Evers, Patrick A. Naylor, "PEVD-based Speech Enhancement in Reverberant Environments", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5106. Accessed: Sep. 29, 2020.
@article{5106-20,
url = {http://sigport.org/5106},
author = {Vincent W. Neo; Christine Evers; Patrick A. Naylor },
publisher = {IEEE SigPort},
title = {PEVD-based Speech Enhancement in Reverberant Environments},
year = {2020} }
TY - EJOUR
T1 - PEVD-based Speech Enhancement in Reverberant Environments
AU - Vincent W. Neo; Christine Evers; Patrick A. Naylor
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5106
ER -
Vincent W. Neo, Christine Evers, Patrick A. Naylor. (2020). PEVD-based Speech Enhancement in Reverberant Environments. IEEE SigPort. http://sigport.org/5106
Vincent W. Neo, Christine Evers, Patrick A. Naylor, 2020. PEVD-based Speech Enhancement in Reverberant Environments. Available at: http://sigport.org/5106.
Vincent W. Neo, Christine Evers, Patrick A. Naylor. (2020). "PEVD-based Speech Enhancement in Reverberant Environments." Web.
1. Vincent W. Neo, Christine Evers, Patrick A. Naylor. PEVD-based Speech Enhancement in Reverberant Environments [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5106

Pages