Sorry, you need to enable JavaScript to visit this website.

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript. A major drawback of current systems, however, is that edited recordings often sound unnatural because of prosody mismatches around edited regions. In our work, we propose a new context-aware method for more natural sounding text-based editing of speech.

Categories:
9 Views

Safe screening rules are powerful tools to accelerate iterative solvers in sparse regression problems. They allow early identification of inactive coordinates (i.e., those not belonging to the support of the solution) which can thus be screened out in the course of iterations. In this paper, we extend the GAP Safe screening rule to the L1-regularized Kullback-Leibler divergence which does not fulfill the regularity assumptions made in previous works. The proposed approach is experimentally validated on synthetic and real count data sets.

Categories:
52 Views

This paper focuses on channel estimation for mmWave MIMO systems with 1-bit spatial sigma-delta analog-to-digital converters (ADCs) and digital-to-analog converters (DACs). The channel estimation performance with 1-bit spatial sigma-delta modulators (i.e., ADCs or DACs) depends on the quantization noise modeling. Therefore, we present a new method for modeling the quantization noise by leveraging the deterministic input-output relation of the 1-bit spatial sigma-delta modulator.

Categories:
20 Views

This paper proposes voicing-aware conditional discriminators for Parallel WaveGAN-based waveform synthesis systems. In this framework, we adopt a projection-based conditioning method that can significantly improve the discriminator's performance. Furthermore, the conventional discriminator is separated into two waveform discriminators for modeling voiced and unvoiced speech.

Categories:
16 Views

Identifying uterine contractions with the aid of machine learning methods is necessary vis-á-vis their use in combination with fetal heart rates and other clinical data for the assessment of a fetus wellbeing. In this paper, we study contraction identification by processing noisy signals due to uterine activities. We propose a complete four-step method where we address the imbalanced classification problem with an ensemble Gaussian process classifier, where the Gaussian process latent variable model is used as a decision-maker.

Categories:
5 Views

Classification with imbalanced data is a common and challenging problem in many practical machine learning problems. Ensemble learning is a popular solution where the results from multiple base classifiers are synthesized to reduce the effect of a possibly skewed distribution of the training set. In this paper, binary classifiers based on Gaussian processes are chosen as bases for inferring the predictive distributions of test latent variables. We apply a Gaussian process latent variable model where the outputs of the Gaussian processes are used for making the final decision.

Categories:
17 Views

We address the problem of enabling two-dimensional digital image correlation (DIC) for strain measurement on large three-dimensional objects with curved surfaces. It is challenging to acquire full-field qualified images of the surface required by DIC due to geometric distortion and the narrow visual field of the surface that a single image can cover. To overcome this issue, we propose an end-to-end DIC framework incorporating the image fusion principle to achieve full-field strain measurement over the curved surface.

Categories:
19 Views

Automatic speech recognition (ASR) systems are highly sensitive to train-test domain mismatch. However, because transcription is often prohibitively expensive, it is important to be able to make use of available transcribed out-of-domain data. We address the problem of domain adaptation with semi-supervised training (SST). Contrary to work in in-domain SST, we find significant performance improvement even with just one hour of target-domain data—though, the selection of the data is critical.

Categories:
16 Views

Speech generation and enhancement have seen recent breakthroughs in quality thanks to deep learning. These methods typically operate at a limited sampling rate of 16-22kHz due to computational complexity and available datasets. This limitation imposes a gap between the output of such methods and that of high-fidelity (≥44kHz) real-world audio applications. This paper proposes a new bandwidth extension (BWE) method that expands 8-16kHz speech signals to 48kHz. The method is based on a feed-forward WaveNet architecture trained with a GAN-based deep feature loss.

Categories:
123 Views

Pages