Speech Processing

Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

Read more about Partially Fake Audio Detection by Self-attention-based Fake Span Discovery
Log in to post comments

The past few years have witnessed the significant advances of speech synthesis and voice conversion technologies. However, such technologies can undermine the robustness of broadly implemented biometric identification models and can be harnessed by in-the-wild attackers for illegal uses. The ASVspoof challenge mainly focuses on synthesized audios by advanced speech synthesis and voice conversion models, and replay attacks. Recently, the first Audio Deep Synthesis Detection challenge (ADD 2022) extends the attack scenarios into more aspects.

ADDposter-20220418.pdf

ADDposter-20220418.pdf (167)

Categories:: Speech Processing

8 Views

UNIVERSAL PARALINGUISTIC SPEECH REPRESENTATIONS USING SELF-SUPERVISED CONFORMERS - ICASSP 2022 Poster

ICASSP 2022 poster.pdf

ICASSP 2022 poster (188)

Categories:: Speech Processing

9 Views

Universal Paralinguistic Speech Representations using Self-Supervised Conformers - ICASSP 2022 slides

ICASSP 2022.pdf

ICASSP 2022.pdf (192)

Categories:: Speech Processing

14 Views

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

Read more about Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling
Log in to post comments

A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result in a significant amount of false mispronunciation alarms.

ICASSP_Daniel_korzekwa_Pronounciation_Error_Detection.pptx

ICASSP_Daniel_korzekwa_Pronounciation_Error_Detection.pptx (247)

Categories:: Speech Processing

7 Views

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

Read more about Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling
Log in to post comments

ICASSP_Daniel_korzekwa_Pronounciation_Error_Detection_slides.pptx

ICASSP_Daniel_korzekwa_Pronounciation_Error_Detection_slides.pptx (281)

Categories:: Speech Processing

6 Views

PRE-TRAINING TRANSFORMER DECODER FOR END-TO-END ASR MODEL WITH UNPAIRED TEXT DATA

Read more about PRE-TRAINING TRANSFORMER DECODER FOR END-TO-END ASR MODEL WITH UNPAIRED TEXT DATA
Log in to post comments

ICASSP2021_poster (3).pdf

ICASSP2021_poster (3).pdf (246)

Categories:: Speech Processing

53 Views

Speech Emotion Recognition based on Listener Adaptive Models

Read more about Speech Emotion Recognition based on Listener Adaptive Models
Log in to post comments

ICASSP21_EmotionListenerAdaptiveModels_v4.pdf

ICASSP21_EmotionListenerAdaptiveModels_v4.pdf (273)

Categories:: Speech Processing

26 Views

Have You Made A Decision? Where? A Pilot Study on Interpretability of Polarity Analysis Based on Advising Problem

The general approaches for polarity analysis in dialogue, e.g. Multiple Instance Learning (MIL), have achieved significant progress.
However, one significant drawback of current approaches is that the contribution of an utterance towards the polarity being a \emph{black-box}.
For existing methods, the polarity contained in each utterance, which we call meta-polarity, is not explicitly utilized.
In this paper, we study the problem of adding interpretability to the overall polarity by predicting the meta-polarity at the same time.

ICASSP2021.zip

ICASSP2021 presentation and poster (239)

Categories:: Speech Processing

8 Views

REDAT: ACCENT-INVARIANT REPRESENTATION FOR END-TO-END ASR BY DOMAIN ADVERSARIAL TRAINING WITH RELABELING

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accent-invariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions.