Source Separation and Signal Enhancement

ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS

Read more about ALL-NEURAL ONLINE SOURCE SEPARATION, COUNTING, AND DIARIZATION FOR MEETING ANALYSIS
Log in to post comments

Automatic meeting analysis comprises the tasks of speaker counting, speaker diarization, and the separation of overlapped speech, followed by automatic speech recognition. This all has to be carried out on arbitrarily long sessions and, ideally, in an online or block-online manner. While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation.

presentation.pdf

presentation.pdf (476)

Categories:: Source Separation and Signal Enhancement

15 Views

End-to-End Sound Source Separation Conditioned On Instrument Labels

Read more about End-to-End Sound Source Separation Conditioned On Instrument Labels
Log in to post comments

Can we perform an end-to-end music source separation with a variable number of sources using a deep learning model? We present an extension of the Wave-U-Net model which allows end-to-end monaural source separation with a non-fixed number of sources. Furthermore, we propose multiplicative conditioning with instrument labels at the bottleneck of the Wave-U-Net and show its effect on the separation results. This approach leads to other types of conditioning such as audio-visual source separation and score-informed source separation.

ICASSP2019.pdf

ICASSP2019.pdf (390)

Categories:: Source Separation and Signal Enhancement

27 Views

Similarity Search-based Blind Source Separation

Read more about Similarity Search-based Blind Source Separation
Log in to post comments

In this paper, we propose a new method for blind source separation, where we perform similarity search for a prepared clean speech database. The purpose of this mechanism is to separate short utterances that we frequently encounter in a real-world situation. The new method employs a local Gaussian model (LGM) for the probability density functions of separated signals, and updates the LGM variance parameters by using the similarity search results.

Slide_ICASSP2019_sawada.pdf

Slide_ICASSP2019_sawada.pdf (2505)

Slide_ICASSP2019_sawada.pdf

Slide_ICASSP2019_sawada.pdf (362)

Categories:: Source Separation and Signal Enhancement

47 Views

A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT

In this paper we propose a fully convolutional neural network (CNN) for complex spectrogram processing in speech enhancement.

icassp_draft_zhiheng.pdf

icassp_draft_zhiheng.pdf (1624)

Categories:: Source Separation and Signal Enhancement

152 Views

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments

In this paper, we address the problem of enhancing the speech of a speaker of interest in a cocktail party scenario when visual information of the speaker of interest is available.Contrary to most previous studies, we do not learn visual features on the typically small audio-visual datasets, but use an already available face landmark detector (trained on a separate image dataset).The landmarks are used by LSTM-based models to generate time-frequency masks which are applied to the acoustic mixed-speech spectrogram.

Morrone_ICASSP_Poster.pdf

Morrone_ICASSP_Poster.pdf (375)

Categories:: Source Separation and Signal Enhancement
Speech Enhancement (SPE-ENHA)

20 Views

A Proper Version of Synthesis-Based Sparse Audio Declipper

Read more about A Proper Version of Synthesis-Based Sparse Audio Declipper
Log in to post comments

Methods based on sparse representation have found great use in the recovery of audio signals degraded by clipping. The state of the art in declipping within the sparsity-based approaches has been achieved by the SPADE algorithm by Kitić et. al. (LVA/ICA’15). Our recent study (LVA/ICA’18) has shown that although the original S-SPADE can be improved such that it converges faster than the A-SPADE, the restoration quality is significantly worse. In the present paper, we propose a new version of S-SPADE.

A Proper Version of Synthesis-Based Sparse Audio Declipper Poster.pdf

A Proper Version of Synthesis-Based Sparse Audio Declipper Poster.pdf (282)

Categories:: Source Separation and Signal Enhancement

31 Views

Semi-supervised Monaural Singing Voice Separation with a Masking Network Trained on Synthetic Mixtures - Poster

We study the problem of semi-supervised singing voice separation, in which the training data contains a set of samples of mixed music (singing and instrumental) and an unmatched set of instrumental music. Our solution employs a single mapping function g, which, applied to a mixed sample, recovers the underlying instrumental music, and, applied to an instrumental sample, returns the same sample. The network g is trained using purely instrumental samples, as well as on synthetic mixed samples that are created by mixing reconstructed singing voices with random instrumental samples.

poster_icassp_2019_singing.pdf

poster_icassp_2019_singing.pdf (306)

Categories:: Source Separation and Signal Enhancement

13 Views

A Pitch-Aware Approach to Single-Channel Speech Separation

Read more about A Pitch-Aware Approach to Single-Channel Speech Separation
Log in to post comments

Despite significant advancements of deep learning on separating speech sources mixed in a single channel, same gender speaker mix, i.e., male-male or female-female, is still more difficult to separate than the case of opposite gender mix. In this study, we propose a pitch-aware speech separation approach to improve the speech separation performance.

ICASSP2019_Poster_KeWang.pdf

Poster of "A Pitch-Aware Approach to Single-Channel Speech Separation" (289)

Categories:: Source Separation and Signal Enhancement

21 Views

Minimum-Volume Rank-Deficient Nonnegative Matrix Factorizations

Read more about Minimum-Volume Rank-Deficient Nonnegative Matrix Factorizations
Log in to post comments

In recent years, nonnegative matrix factorization (NMF) with volume regularization has been shown to be a powerful identifiable model; for example for hyperspectral unmixing, document classification, community detection and hidden Markov models. We show that minimum-volume NMF (min-vol NMF) can also be used when the basis matrix is rank deficient, which is a reasonable scenario for some real-world NMF problems (e.g., for unmixing multispectral images).

main.pdf

main.pdf (313)

Categories:: Source Separation and Signal Enhancement

36 Views

OPTIMIZATION OF SPEAKER EXTRACTION NEURAL NETWORK WITH MAGNITUDE AND TEMPORAL SPECTRUM APPROXIMATION LOSS

The SpeakerBeam-FE (SBF) method is proposed for speaker extraction. It attempts to overcome the problem of unknown number of speakers in an audio recording during source separation. The mask approximation loss of SBF is sub-optimal, which doesn’t calculate direct signal reconstruction error and consider the speech context. To address these problems, this paper proposes a magnitude and temporal spectrum approximation loss to estimate a phase sensitive mask for the target speaker with the speaker characteristics.

ICASSP2019_Poster_final.pdf

ICASSP Poster (383)

Categories:: Source Separation and Signal Enhancement

14 Views

Source Separation and Signal Enhancement

Pages