
- Read more about HARMONICITY PLAYS A CRITICAL ROLE IN DNN BASED VERSUS IN BIOLOGICALLY-INSPIRED MONAURAL SPEECH SEGREGATION SYSTEMS
- Log in to post comments
Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered.
- Categories:

- Read more about BLIND UNMIXING USING A DOUBLE DEEP IMAGE PRIOR
- Log in to post comments
ICASSP2022.pdf

- Categories:

- Read more about BLIND UNMIXING USING A DOUBLE DEEP IMAGE PRIOR
- Log in to post comments
Poster.pdf

- Categories:

- Read more about ICASSP21 Poster of `Nonnegative Unimodal Matrix Factorization'
- Log in to post comments
We introduce a new Nonnegative Matrix Factorization (NMF) model called Nonnegative Unimodal Matrix Factorization (NuMF), which adds on top of NMF the unimodal condition on the columns of the basis matrix. NuMF finds applications for example in analytical chemistry. We propose a simple but naive brute-force heuristics strategy based on accelerated projected gradient. It is then improved by using multi-grid for which we prove that the restriction operator preserves the unimodality.
- Categories:

- Read more about SANDGLASSET: A LIGHT MULTI-GRANULARITY SELF-ATTENTIVE NETWORK FOR TIME-DOMAIN SPEECH SEPARATION
- Log in to post comments
One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers. In contrast, our key finding is that multi-granularity features are essential for enhancing contextual modeling and computational efficiency. We introduce a self-attentive network with a novel sandglass-shape, namely Sandglasset, which advances the state-of-the-art (SOTA) SS performance at significantly smaller model size and computational cost.
- Categories:

- Read more about ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING
- Log in to post comments
Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed spatial features are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework. In this architecture, time-domain filters spanning signal channels are trained to perform adaptive spatial filtering.
- Categories:

- Read more about Efficient Parameter Estimation for Semi-Continuous Data: An Application to Independent Component Analysis
- Log in to post comments
MLSP_2019.pdf

- Categories:

- Read more about HOW MANY FMRI SCANS ARE NECESSARY AND SUFFICIENT FOR RESTING BRAIN CONNECTIVITY ANALYSIS?
- Log in to post comments
Functional connectivity analysis by detecting neuronal coactivation in the brain can be efficiently done using Resting State Functional Magnetic Resonance Imaging (rs-fMRI) analysis. Most of the existing research in this area employ correlation-based group averaging strategies of spatial smoothing and temporal normalization of fMRI scans, whose reliability of results heavily depends on the voxel resolution of fMRI scan as well as scanning duration. Scanning period from 5 to 11 minutes has been chosen by most of the studies while estimating the connectivity of brain networks.
- Categories:

- Read more about Deep attractor networks for speaker re-identification and blind source separation
- Log in to post comments
Deep Clustering (DC) and Deep Attractor Networks (DANs) are a data-driven way to monaural blind source separation.
Both approaches provide astonishing single channel performance but have not yet been generalized to block-online processing.
When separating speech in a continuous stream with a block-online algorithm, it needs to be determined in each block which of the output streams belongs to whom.
In this contribution we solve this block permutation problem by introducing an additional speaker identification embedding to the DAN model structure.
- Categories:

- Read more about TasNet: time-domain audio separation network for real-time, single-channel speech separation
- Log in to post comments
Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation.
- Categories: