- Read more about Poster for ICASSP 2024 paper "Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion"
- Log in to post comments
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning strategy to further benefit from LLM-encoded knowledge for understanding the tasks and conversational contexts, leading to additional improvements.
- Categories:
29 Views