Sorry, you need to enable JavaScript to visit this website.

Source Separation and Signal Enhancement

An Improved Measure of Musical Noise Based on Spectral Kurtosis


Audio processing methods operating on a time-frequency representation of the signal can introduce unpleasant sounding artifacts known as musical noise. These artifacts are observed in the context of audio coding, speech enhancement, and source separation. The change in kurtosis of the power spectrum introduced during the processing was shown to correlate with the human perception of musical noise in the context of speech enhancement, leading to the proposal of measures based on it. These baseline measures are here shown to correlate with human perception only in a limited manner.

Paper Details

Authors:
Matteo Torcoli
Submitted On:
14 October 2019 - 3:13am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

poster_FINAL.pdf

(331)

Subscribe

[1] Matteo Torcoli, "An Improved Measure of Musical Noise Based on Spectral Kurtosis", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4868. Accessed: Oct. 17, 2019.
@article{4868-19,
url = {http://sigport.org/4868},
author = {Matteo Torcoli },
publisher = {IEEE SigPort},
title = {An Improved Measure of Musical Noise Based on Spectral Kurtosis},
year = {2019} }
TY - EJOUR
T1 - An Improved Measure of Musical Noise Based on Spectral Kurtosis
AU - Matteo Torcoli
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4868
ER -
Matteo Torcoli. (2019). An Improved Measure of Musical Noise Based on Spectral Kurtosis. IEEE SigPort. http://sigport.org/4868
Matteo Torcoli, 2019. An Improved Measure of Musical Noise Based on Spectral Kurtosis. Available at: http://sigport.org/4868.
Matteo Torcoli. (2019). "An Improved Measure of Musical Noise Based on Spectral Kurtosis." Web.
1. Matteo Torcoli. An Improved Measure of Musical Noise Based on Spectral Kurtosis [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4868

Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement


Deep-learning based speech enhancement systems have offered tremendous gains, where the best performing approaches use long short-term memory (LSTM) recurrent neural networks (RNNs) to model temporal speech correlations. These models, however, do not consider the frequency-level correlations within a single time frame, as spectral dependencies along the frequency axis are often ignored. This results in inaccurate frequency responses that negatively affect perceptual quality and intelligibility. We propose a deep-learning approach that considers temporal and frequency-level dependencies.

Paper Details

Authors:
Khandokar Md. Nayem, Donald S. Williamson
Submitted On:
13 October 2019 - 1:29pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Intra-Spectra Recurrent Output Layer

(5)

Subscribe

[1] Khandokar Md. Nayem, Donald S. Williamson, "Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4864. Accessed: Oct. 17, 2019.
@article{4864-19,
url = {http://sigport.org/4864},
author = {Khandokar Md. Nayem; Donald S. Williamson },
publisher = {IEEE SigPort},
title = {Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement},
year = {2019} }
TY - EJOUR
T1 - Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement
AU - Khandokar Md. Nayem; Donald S. Williamson
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4864
ER -
Khandokar Md. Nayem, Donald S. Williamson. (2019). Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement. IEEE SigPort. http://sigport.org/4864
Khandokar Md. Nayem, Donald S. Williamson, 2019. Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement. Available at: http://sigport.org/4864.
Khandokar Md. Nayem, Donald S. Williamson. (2019). "Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement." Web.
1. Khandokar Md. Nayem, Donald S. Williamson. Incorporating Intra-Spectral Dependencies With A Recurrent Output Layer For Improved Speech Enhancement [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4864

Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

Paper Details

Authors:
Li Li, Hirokazu Kameoka, Shoji Makino
Submitted On:
14 May 2019 - 5:47pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Li2019ICASSP05poster_v2.pdf

(47)

Subscribe

[1] Li Li, Hirokazu Kameoka, Shoji Makino, "Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4515. Accessed: Oct. 17, 2019.
@article{4515-19,
url = {http://sigport.org/4515},
author = {Li Li; Hirokazu Kameoka; Shoji Makino },
publisher = {IEEE SigPort},
title = {Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier},
year = {2019} }
TY - EJOUR
T1 - Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier
AU - Li Li; Hirokazu Kameoka; Shoji Makino
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4515
ER -
Li Li, Hirokazu Kameoka, Shoji Makino. (2019). Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier. IEEE SigPort. http://sigport.org/4515
Li Li, Hirokazu Kameoka, Shoji Makino, 2019. Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier. Available at: http://sigport.org/4515.
Li Li, Hirokazu Kameoka, Shoji Makino. (2019). "Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier." Web.
1. Li Li, Hirokazu Kameoka, Shoji Makino. Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4515

Joint Separation and Dereverberation of Reverberant Mixture with Multichannel Variational Autoencoder

Paper Details

Authors:
Hirokazu Kameoka, Li Li, Shogo Seki, Shoji Makino
Submitted On:
14 May 2019 - 5:42pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

AASP_L4_2.pdf

(42)

Subscribe

[1] Hirokazu Kameoka, Li Li, Shogo Seki, Shoji Makino, "Joint Separation and Dereverberation of Reverberant Mixture with Multichannel Variational Autoencoder", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4514. Accessed: Oct. 17, 2019.
@article{4514-19,
url = {http://sigport.org/4514},
author = {Hirokazu Kameoka; Li Li; Shogo Seki; Shoji Makino },
publisher = {IEEE SigPort},
title = {Joint Separation and Dereverberation of Reverberant Mixture with Multichannel Variational Autoencoder},
year = {2019} }
TY - EJOUR
T1 - Joint Separation and Dereverberation of Reverberant Mixture with Multichannel Variational Autoencoder
AU - Hirokazu Kameoka; Li Li; Shogo Seki; Shoji Makino
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4514
ER -
Hirokazu Kameoka, Li Li, Shogo Seki, Shoji Makino. (2019). Joint Separation and Dereverberation of Reverberant Mixture with Multichannel Variational Autoencoder. IEEE SigPort. http://sigport.org/4514
Hirokazu Kameoka, Li Li, Shogo Seki, Shoji Makino, 2019. Joint Separation and Dereverberation of Reverberant Mixture with Multichannel Variational Autoencoder. Available at: http://sigport.org/4514.
Hirokazu Kameoka, Li Li, Shogo Seki, Shoji Makino. (2019). "Joint Separation and Dereverberation of Reverberant Mixture with Multichannel Variational Autoencoder." Web.
1. Hirokazu Kameoka, Li Li, Shogo Seki, Shoji Makino. Joint Separation and Dereverberation of Reverberant Mixture with Multichannel Variational Autoencoder [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4514

Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation


This paper proposes a Bitwise Gated Recurrent Unit (BGRU) network for the single-channel source separation task. Recurrent Neural Networks (RNN) require several sets of weights within its cells, which significantly increases the computational cost compared to the fully-connected networks. To mitigate this increased computation, we focus on the GRU cells and quantize the feedforward procedure with binarized values and bitwise operations. The BGRU network is trained in two stages.

Paper Details

Authors:
Sunwoo Kim, Mrinmoy Maity, Minje Kim
Submitted On:
10 May 2019 - 7:46pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Bitwise Gated Recurrent Units

(29)

Subscribe

[1] Sunwoo Kim, Mrinmoy Maity, Minje Kim, "Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4425. Accessed: Oct. 17, 2019.
@article{4425-19,
url = {http://sigport.org/4425},
author = {Sunwoo Kim; Mrinmoy Maity; Minje Kim },
publisher = {IEEE SigPort},
title = {Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation},
year = {2019} }
TY - EJOUR
T1 - Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation
AU - Sunwoo Kim; Mrinmoy Maity; Minje Kim
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4425
ER -
Sunwoo Kim, Mrinmoy Maity, Minje Kim. (2019). Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation. IEEE SigPort. http://sigport.org/4425
Sunwoo Kim, Mrinmoy Maity, Minje Kim, 2019. Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation. Available at: http://sigport.org/4425.
Sunwoo Kim, Mrinmoy Maity, Minje Kim. (2019). "Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation." Web.
1. Sunwoo Kim, Mrinmoy Maity, Minje Kim. Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4425

Speech Denoising by Parametric Resynthesis


This work proposes the use of clean speech vocoder parameters
as the target for a neural network performing speech enhancement.
These parameters have been designed for text-to-speech
synthesis so that they both produce high-quality resyntheses
and also are straightforward to model with neural networks,
but have not been utilized in speech enhancement until now.
In comparison to a matched text-to-speech system that is given
the ground truth transcripts of the noisy speech, our model is

Paper Details

Authors:
Soumi Maiti, Michael I Mandel
Submitted On:
10 May 2019 - 3:35pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster.pdf

(28)

Subscribe

[1] Soumi Maiti, Michael I Mandel, "Speech Denoising by Parametric Resynthesis", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4394. Accessed: Oct. 17, 2019.
@article{4394-19,
url = {http://sigport.org/4394},
author = {Soumi Maiti; Michael I Mandel },
publisher = {IEEE SigPort},
title = {Speech Denoising by Parametric Resynthesis},
year = {2019} }
TY - EJOUR
T1 - Speech Denoising by Parametric Resynthesis
AU - Soumi Maiti; Michael I Mandel
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4394
ER -
Soumi Maiti, Michael I Mandel. (2019). Speech Denoising by Parametric Resynthesis. IEEE SigPort. http://sigport.org/4394
Soumi Maiti, Michael I Mandel, 2019. Speech Denoising by Parametric Resynthesis. Available at: http://sigport.org/4394.
Soumi Maiti, Michael I Mandel. (2019). "Speech Denoising by Parametric Resynthesis." Web.
1. Soumi Maiti, Michael I Mandel. Speech Denoising by Parametric Resynthesis [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4394

ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS


Automatic meeting analysis comprises the tasks of speaker counting, speaker diarization, and the separation of overlapped speech, followed by automatic speech recognition. This all has to be carried out on arbitrarily long sessions and, ideally, in an online or block-online manner. While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation.

Paper Details

Authors:
Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach
Submitted On:
10 May 2019 - 12:26pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

presentation.pdf

(44)

Subscribe

[1] Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach, "ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4370. Accessed: Oct. 17, 2019.
@article{4370-19,
url = {http://sigport.org/4370},
author = {Thilo von Neumann; Keisuke Kinoshita; Marc Delcroix; Shoko Araki; Tomohiro Nakatani; Reinhold Haeb-Umbach },
publisher = {IEEE SigPort},
title = {ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS},
year = {2019} }
TY - EJOUR
T1 - ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS
AU - Thilo von Neumann; Keisuke Kinoshita; Marc Delcroix; Shoko Araki; Tomohiro Nakatani; Reinhold Haeb-Umbach
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4370
ER -
Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach. (2019). ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS. IEEE SigPort. http://sigport.org/4370
Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach, 2019. ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS. Available at: http://sigport.org/4370.
Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach. (2019). "ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS." Web.
1. Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach. ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4370

End-to-End Sound Source Separation Conditioned On Instrument Labels


Can we perform an end-to-end music source separation with a variable number of sources using a deep learning model? We present an extension of the Wave-U-Net model which allows end-to-end monaural source separation with a non-fixed number of sources. Furthermore, we propose multiplicative conditioning with instrument labels at the bottleneck of the Wave-U-Net and show its effect on the separation results. This approach leads to other types of conditioning such as audio-visual source separation and score-informed source separation.

Paper Details

Authors:
Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gómez
Submitted On:
10 May 2019 - 7:16am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2019.pdf

(32)

Subscribe

[1] Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gómez, "End-to-End Sound Source Separation Conditioned On Instrument Labels", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4304. Accessed: Oct. 17, 2019.
@article{4304-19,
url = {http://sigport.org/4304},
author = {Olga Slizovskaia; Leo Kim; Gloria Haro; Emilia Gómez },
publisher = {IEEE SigPort},
title = {End-to-End Sound Source Separation Conditioned On Instrument Labels},
year = {2019} }
TY - EJOUR
T1 - End-to-End Sound Source Separation Conditioned On Instrument Labels
AU - Olga Slizovskaia; Leo Kim; Gloria Haro; Emilia Gómez
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4304
ER -
Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gómez. (2019). End-to-End Sound Source Separation Conditioned On Instrument Labels. IEEE SigPort. http://sigport.org/4304
Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gómez, 2019. End-to-End Sound Source Separation Conditioned On Instrument Labels. Available at: http://sigport.org/4304.
Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gómez. (2019). "End-to-End Sound Source Separation Conditioned On Instrument Labels." Web.
1. Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gómez. End-to-End Sound Source Separation Conditioned On Instrument Labels [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4304

Similarity Search-based Blind Source Separation


In this paper, we propose a new method for blind source separation, where we perform similarity search for a prepared clean speech database. The purpose of this mechanism is to separate short utterances that we frequently encounter in a real-world situation. The new method employs a local Gaussian model (LGM) for the probability density functions of separated signals, and updates the LGM variance parameters by using the similarity search results.

Paper Details

Authors:
Hiroshi Sawada, Kazuo Aoyama
Submitted On:
10 May 2019 - 1:42am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Slide_ICASSP2019_sawada.pdf

(28)

Slide_ICASSP2019_sawada.pdf

(35)

Subscribe

[1] Hiroshi Sawada, Kazuo Aoyama, "Similarity Search-based Blind Source Separation", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4256. Accessed: Oct. 17, 2019.
@article{4256-19,
url = {http://sigport.org/4256},
author = {Hiroshi Sawada; Kazuo Aoyama },
publisher = {IEEE SigPort},
title = {Similarity Search-based Blind Source Separation},
year = {2019} }
TY - EJOUR
T1 - Similarity Search-based Blind Source Separation
AU - Hiroshi Sawada; Kazuo Aoyama
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4256
ER -
Hiroshi Sawada, Kazuo Aoyama. (2019). Similarity Search-based Blind Source Separation. IEEE SigPort. http://sigport.org/4256
Hiroshi Sawada, Kazuo Aoyama, 2019. Similarity Search-based Blind Source Separation. Available at: http://sigport.org/4256.
Hiroshi Sawada, Kazuo Aoyama. (2019). "Similarity Search-based Blind Source Separation." Web.
1. Hiroshi Sawada, Kazuo Aoyama. Similarity Search-based Blind Source Separation [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4256

A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT


In this paper we propose a fully convolutional neural network (CNN) for complex spectrogram processing in speech enhancement.

Paper Details

Authors:
Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu, Benoit Champagne
Submitted On:
9 May 2019 - 5:25pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp_draft_zhiheng.pdf

(51)

Subscribe

[1] Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu, Benoit Champagne, " A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4241. Accessed: Oct. 17, 2019.
@article{4241-19,
url = {http://sigport.org/4241},
author = {Zhiheng Ouyang; Hongjiang Yu; Wei-Ping Zhu; Benoit Champagne },
publisher = {IEEE SigPort},
title = { A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT},
year = {2019} }
TY - EJOUR
T1 - A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT
AU - Zhiheng Ouyang; Hongjiang Yu; Wei-Ping Zhu; Benoit Champagne
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4241
ER -
Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu, Benoit Champagne. (2019). A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT. IEEE SigPort. http://sigport.org/4241
Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu, Benoit Champagne, 2019. A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT. Available at: http://sigport.org/4241.
Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu, Benoit Champagne. (2019). " A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT." Web.
1. Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu, Benoit Champagne. A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4241

Pages