IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Dynamic Speech Emotion Recognition using a Conditional Neural Process

Read more about Dynamic Speech Emotion Recognition using a Conditional Neural Process
Log in to post comments

The problem of predicting emotional attributes from speech has often focused on predicting a single value from a sentence or short speaking turn. These methods often ignore that natural emotions are both dynamic and dependent on context. To model the dynamic nature of emotions, we can treat the prediction of emotion from speech as a time-series problem. We refer to the problem of predicting these emotional traces as dynamic speech emotion recognition. Previous studies in this area have used models that treat all emotional traces as coming from the same underlying distribution.

Luz_ICASSP2024_Poster_Final.pdf

Poster of paper: "Dynamic Speech Emotion Recognition using a Conditional Neural Process." (0)

Categories:: Speech Perception and Psychoacoustics (SPE-SPER)
Neural network learning (MLR-NNLR)

1 Views

FEATURE-CONSTRAINED AND ATTENTION-CONDITIONED DISTILLATION LEARNING FOR VISUAL ANOMALY DETECTION

Visual anomaly detection in computer vision is an essential one-class classification and segmentation problem. The student-teacher (S-T) approach has proven effective in addressing this challenge. However, previous studies based on S-T underutilize the feature representations learned by the teacher network, which restricts anomaly detection performance.

ICASSP2024_FCACDL.pptx

ICASSP2024_FCACDL.pptx (5)

Categories:: Image, Video, and Multidimensional Signal Processing

13 Views

[Poster] Contrastive Deep Nonnegative Matrix Factorization For Community Detection

Read more about [Poster] Contrastive Deep Nonnegative Matrix Factorization For Community Detection
Log in to post comments

Recently, nonnegative matrix factorization (NMF) has been widely adopted for community detection, because of its better interpretability. However, the existing NMF-based methods have the following three problems: 1) they directly transform the original network into community membership space, so it is difficult for them to capture the hierarchical information; 2) they often only pay attention to the topology of the network and ignore its node attributes; 3) it is hard for them to learn the global structure information necessary for community detection.

ICASSP2024-Poster.pdf

CDNMF-Poster (16)

Categories:: Machine Learning for Signal Processing

12 Views

[Poster] Contrastive Deep Nonnegative Matrix Factorization For Community Detection

Read more about [Poster] Contrastive Deep Nonnegative Matrix Factorization For Community Detection
Log in to post comments

ICASSP2024-Poster.pdf

Poster (31)

Categories:: Machine Learning for Signal Processing

7 Views

ENHANCING NOISY LABEL LEARNING VIA UNSUPERVISED CONTRASTIVE LOSS WITH LABEL CORRECTION BASED ON PRIOR KNOWLEDGE

To alleviate the negative impacts of noisy labels, most of the noisy label learning (NLL) methods dynamically divide the training data into two types, “clean samples” and “noisy samples”, in the training process. However, the conventional selection of clean samples heavily depends on the features learned in the early stages of training, making it difficult to guarantee the cleanliness of the selected samples in scenarios where the noise ratio is high.

ICASSP2024_poster_kashiwagi_final.pdf

ICASSP2024_poster (14)

Categories:: Other

11 Views

Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data - Poster

Developing tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers. While previous work on this problem has focused exclusively on the text modality, here we explore the utility of the audio modality as an additional input. We create a new multimodal dataset (text and audio in English) containing 48 hours of speech from past political debates in the USA.

ICASSP2024-Checkworthiness-using-audio-data-poster.pdf

ICASSP2024-Checkworthiness-using-audio-data-poster.pdf (13)

Categories:: Speech Analysis (SPE-ANLS)

7 Views

Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data - Presentation

ICASSP2024-Checkworthiness-using-audio-data-presentation.pptx

ICASSP2024-Checkworthiness-using-audio-data-presentation.pptx (14)

Categories:: Speech Analysis (SPE-ANLS)

8 Views

TOWARDS MULTI-DOMAIN FACE LANDMARK DETECTION WITH SYNTHETIC DATA FROM DIFFUSION MODEL

Read more about TOWARDS MULTI-DOMAIN FACE LANDMARK DETECTION WITH SYNTHETIC DATA FROM DIFFUSION MODEL
Log in to post comments

Recently, deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement. However, there are still challenges in face landmark detection in other domains (\eg{} cartoon, caricature, etc). This is due to the scarcity of extensively annotated training data. To tackle this concern, we design a two-stage training approach that effectively leverages limited datasets and the pre-trained diffusion model to obtain aligned pairs of landmarks and face in multiple domains.

poster_이원명_ICASSP2024.pdf

poster_이원명_ICASSP2024.pdf (8)

Categories:: Pattern recognition and classification (MLR-PATT)

7 Views

[poster] Improving Design of Input Condition Invariant Speech Enhancement

Read more about [poster] Improving Design of Input Condition Invariant Speech Enhancement
Log in to post comments

Building a single universal speech enhancement (SE) system that can handle arbitrary input is a demanded but underexplored research topic. Towards this ultimate goal, one direction is to build a single model that handles diverse audio duration, sampling frequencies, and microphone variations in noisy and reverberant scenarios, which we deﬁne here as “input condition invariant SE”. Such a model was recently proposed showing promising performance; however, its multi-channel performance degraded severely in real conditions.

poster_USES2.pdf

poster_USES2.pdf (16)

Categories:: Audio and Acoustic Signal Processing

6 Views

[slides] Generation-Based Target Speech Extraction with Speech Discretization and Vocoder

Target speech extraction (TSE) is a task aiming at isolating the speech of a specific target speaker from an audio mixture, with the help of an auxiliary recording of that target speaker. Most existing TSE methods employ discrimination-based models to estimate the target speaker’s proportion in the mixture, but they often fail to compensate for the missing or highly corrupted frequency components in the speech signal. In contrast, the generation-based methods can naturally handle such scenarios via speech resynthesis.

slides_icassp_discrete_tse_oral.pdf

slides_icassp_discrete_tse_oral.pdf (14)

Categories:: Audio and Acoustic Signal Processing

6 Views

IEEE ICASSP 2024

Pages