Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

Seeing Through the Conversation: Audio-visual Speech Separation based on Diffusion Model

The objective of this work is to extract the target speaker’s voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains challenging. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for its capability to generate natural samples. We also propose a cross-attention-based feature fusion mechanism for an effective fusion of the two modalities for diffusion.

2024ICASSP_AVDiffuSS_final_font.pptx

2024ICASSP_AVDiffuSS_final_font.pptx (114)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

24 Views

Multimodal Transformer With Learnable Frontend and Self Attention for Emotion Recognition

In this work, we propose a novel approach for multi-modal emotion recognition from conversations using speech and text. The audio representations are learned jointly with a learnable audio front-end (LEAF) model feeding to a CNN based classifier. The text representations are derived from pre-trained bidirectional encoder representations from transformer (BERT) along with a gated recurrent network (GRU). Both the textual and audio representations are separately processed using a bidirectional GRU network with self-attention.

ICASSP_soumya_poster.pdf

Poster for ICASSP (209)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

29 Views

LEARNING TO SELECT CONTEXT IN A HIERARCHICAL AND GLOBAL PERSPECTIVE FOR OPEN-DOMAIN DIALOGUE GENERATION

ICASSP2021-landscape.pdf

ICASSP2021-landscape.pdf (258)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

6 Views

Joint On-line Learning of a Zero-Shot Spoken Semantic Parser and a Reinforcement Learning Dialogue Manager

Despite many recent advances for the design of dialogue systems, a true bottleneck remains the acquisition of data required to train its components. Unlike many other language processing applications, dialogue systems require interactions with users, therefore it is complex to develop them with pre-recorded data. Building on previous works, on-line learning is pursued here as a most convenient way to address the issue. Data collection, annotation and use in learning algorithms are performed in a single process.

Poster_ICASSP_RIou_Jabaian_Huet_Lefevre.pdf

Poster_ICASSP_RIou_Jabaian_Huet_Lefevre.pdf (428)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

13 Views

Improving Human-Computer Interaction in Low-Resource Settings with Text-to-Phonetic Data Augmentation

Off-the-shelf speech recognizers are error-prone in specialized domains; we aim to mitigate the impact of these errors for downstream classification tasks without in-domain speech training data, by augmenting available typewritten text training data with inferred phonetic information. We apply our method to mitigate the effects of the lack of speech training data when converting a typed chatbot to a spoken language interface.

Paper available here: https://ieeexplore.ieee.org/document/8682550

stiff-ICASSP-poster.pdf

Conference poster (471)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

29 Views

SEQUENTIAL MATCHING MODEL FOR END-TO-END MULTI-TURN RESPONSE SELECTION

Read more about SEQUENTIAL MATCHING MODEL FOR END-TO-END MULTI-TURN RESPONSE SELECTION
Log in to post comments

https://ieeexplore.ieee.org/document/8682538

Poster__ICASSP2019.pdf

Poster of ICASSP2019 (509)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

57 Views

Poster of 'DEEP HYBRID NETWORKS BASED RESPONSE SELECTION FOR MULTI-TURN DIALOGUE SYSTEMS'

Poster-DEEP HYBRID NETWORKS BASED RESPONSE SELECTION FOR MULTI-TURN DIALOGUE SYSTEMS.pdf

Poster-DEEP HYBRID NETWORKS BASED RESPONSE SELECTION FOR MULTI-TURN DIALOGUE SYSTEMS.pdf (471)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

88 Views

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Read more about Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning
Log in to post comments

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts.

poster_icassp2018.pdf

Poster for Advantage A2C Dialogue Policy Learning (781)

Categories:: Spoken and Multimodal Dialog Systems and Applications (SLP-SMMD)

56 Views

Attention-based Dialog State Tracking for Conversational Interview Coaching

Read more about Attention-based Dialog State Tracking for Conversational Interview Coaching
Log in to post comments

This study proposes an approach to dialog state tracking (DST) in a conversational interview coaching system. For the interview coaching task, the semantic slots, used mostly in traditional dialog systems, are difficult to define manually. This study adopts the topic profile of the response from the interviewee as the dialog state representation. In addition, as the response generally consists of several sentences, the summary vector obtained from a long short-term memory neural network (LSTM) is likely to contain noisy information from many irrelevant sentences.