Audio Analysis and Synthesis

Optimizing Short-Time Fourier Transform Parameters via Gradient Descent

Read more about Optimizing Short-Time Fourier Transform Parameters via Gradient Descent
Log in to post comments

icassp_slides.pdf

Presentation Slides (385)

Categories:: Audio Analysis and Synthesis

72 Views

Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint with a fast maximum inner-product search. To this end, we present a contrastive learning framework that derives from the segment-level search objective. Each update in training uses a batch consisting of a set of pseudo labels, randomly selected original samples, and their augmented replicas.

icassp 2021 poster.pdf

icassp 2021 poster.pdf (546)

Categories:: Music Signal Processing
Content-Based Audio Processing
Audio Analysis and Synthesis

52 Views

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity. Prior studies show that it is possible to disentangle emotional prosody using an encoder-decoder network conditioned on discrete representation, such as one-hot emotion labels. Such networks learn to remember a fixed set of emotional styles.

icassp_poster.pdf

Poster (407)

icassp_slides.pdf

Slides (428)

Categories:: Audio Analysis and Synthesis
Speech Synthesis and Generation, including TTS (SPE-SYNT)

43 Views

DENOISPEECH: DENOISING TEXT TO SPEECH WITH FRAME-LEVEL NOISE MODELING

Read more about DENOISPEECH: DENOISING TEXT TO SPEECH WITH FRAME-LEVEL NOISE MODELING
Log in to post comments

Poster.pdf

poster (492)

DenoiSpeech.pptx

presentation slides (272)

Categories:: Audio Analysis and Synthesis

41 Views

Robust Fundamental Frequency Estimation in Coloured Noise

Read more about Robust Fundamental Frequency Estimation in Coloured Noise
Log in to post comments

Most parametric fundamental frequency estimators make the implicit assumption that any corrupting noise is additive, white Gaussian. Under this assumption, the maximum likelihood (ML) and the least squares estimators are the same, and statistically efficient. However, in the coloured noise case, the estimators differ, and the spectral shape of the corrupting noise should be taken into account.

slidesICASSP2020F0Color.pdf

pitch in colored noise (458)

Categories:: Audio Analysis and Synthesis

154 Views

VaPar Synth - A Variational Parametric Model for Audio Synthesis

Read more about VaPar Synth - A Variational Parametric Model for Audio Synthesis
1 comment
Log in to post comments

With the advent of data-driven statistical modeling and abundant computing power, researchers are turning increasingly to deep learning for audio synthesis. These methods try to model audio signals directly in the time or frequency domain. In the interest of more flexible control over the generated sound, it could be more useful to work with a parametric representation of the signal which corresponds more directly to the musical attributes such as pitch, dynamics and timbre.

ICASSP_presentation.pdf

Presentation Slides (413)

Categories:: Audio Analysis and Synthesis
Applications in Music and Audio Processing (MLR-MUSI)

56 Views

AMA: An Open-source Amplitude Modulation Analysis Toolkit for Signal Processing Applications

For their analysis with conventional signal processing tools, non-stationary signals are assumed to be stationary (or at least wide-sense stationary) in short intervals. While this approach allows them to be studied, it disregards the temporal evolution of their statistics. As such, to analyze this type of signals, it is desirable to use a representation that registers and characterizes the temporal changes in the frequency content of the signals, as these changes may occur in single or multiple periodic ways.

globalsip_2019.pdf

Poster (650)

Categories:: Audio Analysis and Synthesis
Biomedical signal processing
Signal and System Modeling, Representation and Estimation

124 Views

DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS

Deep neural networks (DNNs) have been successfully deployed for acoustic modelling in statistical parametric speech synthesis (SPSS) systems. Moreover, DNN-based postfilters (PF) have also been shown to outperform conventional postfilters that are widely used in SPSS systems for increasing the quality of synthesized speech. However, existing DNN-based postfilters are trained with speaker-dependent databases. Given that SPSS systems can rapidly adapt to new speakers from generic models, there is a need for DNN-based postfilters that can adapt to new speakers with minimal adaptation data.

ICASSP_2019_v1.pptx

ICASSP_2019_v1.pptx (416)

Categories:: Audio Analysis and Synthesis

8 Views

F0 CONTOUR ESTIMATION USING PHONETIC FEATURE IN ELECTROLARYNGEAL SPEECH ENHANCEMENT

Read more about F0 CONTOUR ESTIMATION USING PHONETIC FEATURE IN ELECTROLARYNGEAL SPEECH ENHANCEMENT
Log in to post comments

Pitch plays a significant role in understanding a tone based language like Mandarin. In this paper, we present a new method that estimates F0 contour for electrolaryngeal (EL) speech enhancement in Mandarin. Our system explores the usage of phonetic feature to improve the quality of EL speech. First, we train an acoustic model for EL speech and generate the phoneme posterior probabilities feature sequence for each input EL speech utterance. Then we employ the phonetic feature for F0 contour generation rather than the acoustic feature.