Resource Constrained Acoustic and Langugage Modeling

Poster for ICASSP 2024 paper "Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion"

Read more about Poster for ICASSP 2024 paper "Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion"
Log in to post comments

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning strategy to further benefit from LLM-encoded knowledge for understanding the tasks and conversational contexts, leading to additional improvements.

Turn-taking LLM ICASSP2024 Poster_v2 (1).pdf

Turn-taking LLM ICASSP2024 Poster_v2 (1).pdf (252)

Categories:: Other

64 Views