Speech Analysis (SPE-ANLS)

FOLLOWING THE EMBEDDING: IDENTIFYING TRANSITION PHENOMENA IN WAV2VEC 2.0 REPRESENTATIONS OF SPEECH AUDIO

Although transformer-based models have improved the state-of-the-art in speech recognition, it is still not well understood what information from the speech signal these models encode in their latent representations. This study investigates the potential of using labelled data (TIMIT) to probe wav2vec 2.0 embeddings for insights into the encoding and visualisation of speech signal information at phone boundaries. Our experiment involves training probing models to detect phone-specific articulatory features in the hidden layers based on IPA classifications.

ICASSP2024_poster_follwing_the_embedding.pdf

ICASSP2024_poster_follwing_the_embedding.pdf (279)

Categories:: Speech Analysis (SPE-ANLS)
General Topics in Speech Recognition (SPE-GASR)
Other

81 Views

Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data - Poster

Developing tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers. While previous work on this problem has focused exclusively on the text modality, here we explore the utility of the audio modality as an additional input. We create a new multimodal dataset (text and audio in English) containing 48 hours of speech from past political debates in the USA.

ICASSP2024-Checkworthiness-using-audio-data-poster.pdf

ICASSP2024-Checkworthiness-using-audio-data-poster.pdf (192)

Categories:: Speech Analysis (SPE-ANLS)

34 Views

Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data - Presentation

ICASSP2024-Checkworthiness-using-audio-data-presentation.pptx

ICASSP2024-Checkworthiness-using-audio-data-presentation.pptx (189)

Categories:: Speech Analysis (SPE-ANLS)

33 Views

[Poster] Selective Acoustic Feature Enhancement for Speech Emotion Recognition with Noisy Speech

A speech emotion recognition (SER) system deployed on a real-world application can encounter speech contaminated with unconstrained background noise. To deal with this issue,

poster-selective-feature-enhancement.pdf

poster-selective-feature-enhancement.pdf (137)

Categories:: Speech Analysis (SPE-ANLS)

18 Views

[Poster] Crowdsourced and Automatic Speech Prominence Estimation

Read more about [Poster] Crowdsourced and Automatic Speech Prominence Estimation
Log in to post comments

The prominence of a spoken word is the degree to which an average native listener perceives the word as salient or emphasized relative to its context. Speech prominence estimation is the process of assigning a numeric value to the prominence of each word in an utterance. These prominence labels are useful for linguistic analysis, as well as training automated systems to perform emphasis-controlled text-to-speech or emotion recognition. Manually annotating prominence is time-consuming and expensive, which motivates the development of automated methods for speech prominence estimation.

icassp-2024-prominence-poster.pdf

icassp-2024-prominence-poster.pdf (138)

Categories:: Speech Analysis (SPE-ANLS)
Spoken language resources and annotation (SLP-REAN)

18 Views

[Paper] Crowdsourced and Automatic Speech Prominence Estimation

Read more about [Paper] Crowdsourced and Automatic Speech Prominence Estimation
Log in to post comments

morrison2024crowdsourced.pdf

morrison2024crowdsourced.pdf (172)

Categories:: Speech Analysis (SPE-ANLS)
Spoken language resources and annotation (SLP-REAN)

27 Views

ASSD: Synthetic Speech Detection in the AAC Compressed Domain

Read more about ASSD: Synthetic Speech Detection in the AAC Compressed Domain
Log in to post comments

Synthetic human speech signals have become very easy to generate given modern text-to-speech methods. When these signals are shared on social media they are often compressed using the Advanced Audio Coding (AAC) standard. Our goal is to study if a small set of coding metadata contained in the AAC compressed bit stream is sufficient to detect synthetic speech. This would avoid decompressing of the speech signals before analysis. We call our proposed method AAC Synthetic Speech Detection (ASSD).

icassp_assd_slides_v03.pdf

icassp_assd_slides_v03.pdf (199)

Categories:: Multimedia Forensics
Speech Analysis (SPE-ANLS)

32 Views

SPEECH-BASED EMOTION RECOGNITION WITH SELF-SUPERVISED MODELS USING ATTENTIVE CHANNEL-WISE CORRELATIONS AND LABEL SMOOTHING

When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion-relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels. Self-supervised pre-trained representations can robustly capture information from speech enabling state-of-the-art results in many downstream tasks including emotion recognition. However, better ways of aggregating the information across time need to be considered as the relevant emotion information is likely to appear piecewise and not uniformly across the signal.

poster_emotion_recognition_ssl_attentive_corr_icassp_2023.pdf

poster_emotion_recognition_ssl_attentive_corr_icassp_2023.pdf (187)

Categories:: Speech Analysis (SPE-ANLS)

49 Views

Federated Intelligent Terminals Facilitate Stuttering Monitoring

Read more about Federated Intelligent Terminals Facilitate Stuttering Monitoring
Log in to post comments

Stuttering is a complicated language disorder. The most common form of stuttering is developmental stuttering, which begins in childhood. Early monitoring and intervention are essential for the treatment of children with stuttering. Automatic speech recognition technology has shown its great potential for non-fluent disorder identification, whereas the previous work has not considered the privacy of users' data. To this end, we propose federated intelligent terminals for automatic monitoring of stuttering speech in different contexts.

ICASSP_camera_slides.pdf

Presentation Slides(PDF) (222)

ICASSP_camerady.pptx

Presentation Slides(PPT) (224)

Categories:: Speech Analysis (SPE-ANLS)
Applications
Biomedical signal processing

83 Views

Exploring Subgroup Performance in End-to-End Speech Models

Read more about Exploring Subgroup Performance in End-to-End Speech Models
Log in to post comments

End-to-End Spoken Language Understanding models are generally evaluated according to their overall accuracy, or separately on (a priori defined) data subgroups of interest.

ICASSP2023_poster_new.pdf

ICASSP2023_poster_new.pdf (210)

Categories:: Speech Analysis (SPE-ANLS)
Spoken Language Understanding (SLP-UNDE)

17 Views

Speech Analysis (SPE-ANLS)

Pages