Sorry, you need to enable JavaScript to visit this website.

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Acoustic differences between children’s and adults’ speech causes the degradation in the automatic speech recognition system performance when system trained on adults’ speech and tested on children’s speech. The key acoustic mismatch factors are formant, speaking rate, and pitch. In this paper, we proposed a linear prediction based spectral warping method by using the knowledge of vowel and non-vowel regions in speech signals to mitigate the formant frequencies differences between child and adult speakers.

Categories:
14 Views

Online beat tracking (OBT) has always been a challenging task. Due to the inaccessibility of future data and the need to make inference in real-time. We propose Don’t Look back! (DLB), a novel approach optimized for efficiency when performing OBT. DLB feeds the activations of a unidirectional RNN into an enhanced Monte-Carlo localization model to infer beat positions. Most preexisting OBT methods either apply some offline approaches to a moving window containing past data to make predictions about future beat positions or must be primed with past data at startup to initialize.

Categories:
17 Views

We present and analyze an alternative, more robust approach to the Welch’s overlapped segment averaging (WOSA) spectral estimator. Our method computes sample percentiles instead of averaging over multiple periodograms to estimate power spectral densities (PSDs). Bias and variance of the proposed estimator are derived for varying sample sizes and arbitrary percentiles. We have found excellent agreement between our expressions and data sampled from a white Gaussian noise process.

Categories:
6 Views

We present and analyze an alternative, more robust approach to the Welch’s overlapped segment averaging (WOSA) spectral estimator. Our method computes sample percentiles instead of averaging over multiple periodograms to estimate power spectral densities (PSDs). Bias and variance of the proposed estimator are derived for varying sample sizes and arbitrary percentiles. We have found excellent agreement between our expressions and data sampled from a white Gaussian noise process.

Categories:
7 Views

Lossless compression of datasets is a problem of significant theoretical and practical interest. It appears naturally in the task of storing, sending, or archiving large collections of information for scientific research. We can greatly improve encoding bitrate if we allow the compression of the original dataset to decompress to a permutation of the data. We prove the equivalence of dataset compression to compressing a permutation-invariant structure of the data and implement such a scheme via predictive coding.

Categories:
7 Views

A new Time-Stretched-Pulse provides a solid foundation of acoustic measurements
Motivation: Measure and record speech data acquisition and presentation conditions
Issues: The target (real-world) systems consist of not only linear time-invariant but also non-linear time-invariant, random, and time-varying responses
Solution: We invented a simultaneous measurement of multiple paths by combining extended TSP signals with binary orthogonal weight sequences

Categories:
19 Views

A new Time-Stretched-Pulse provides a solid foundation of acoustic measurements
Motivation: Measure and record speech data acquisition and presentation conditions
Issues: The target (real-world) systems consist of not only linear time-invariant but also non-linear time-invariant, random, and time-varying responses
Solution: We invented a simultaneous measurement of multiple paths by combining extended TSP signals with binary orthogonal weight sequences

Categories:
8 Views

There is growing interest in the use of deep neural network
(DNN) based image denoising to reduce patient’s X-ray
dosage in medical computed tomography (CT). An effective
denoiser must remove noise while maintaining the texture
and detail. Commonly used mean squared error (MSE) loss
functions in the DNN training weight errors due to bias and
variance equally. However, the error due to bias is often more
egregious since it results in loss of image texture and detail.
In this paper, we present a novel approach to designing a loss

Categories:
56 Views

In this paper, we propose a multi-granularity feature interaction and relation reasoning network (MFIRRN) which can recover a detail-rich 3D face and perform more accurate dense alignment in an unconstrained environment. Traditional 3DMM-based methods directly regress parameters, resulting in the lack of fine-grained details in the reconstruction 3D face. To this end, we use different branches to capture discriminative features at different granularities, especially local features at medium and fine granularities.

Categories:
16 Views

In this paper, we propose a multi-granularity feature interaction and relation reasoning network (MFIRRN) which can recover a detail-rich 3D face and perform more accurate dense alignment in an unconstrained environment. Traditional 3DMM-based methods directly regress parameters, resulting in the lack of fine-grained details in the reconstruction 3D face. To this end, we use different branches to capture discriminative features at different granularities, especially local features at medium and fine granularities.

Categories:
12 Views

Pages