Sorry, you need to enable JavaScript to visit this website.

This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions. We propose a novel approach that combines convolutional neural networks (CNNs) and conditional random fields (CRFs) for context modeling in DA classification. We explore the impact of transcriptions generated from different automatic speech recognition systems such as hybrid TDNN/HMM and End-to-End systems on the final performance. Experimental results on two benchmark datasets (MRDA and SwDA) show that the combination CNN and CRF improves consistently the accuracy.

Categories:
16 Views

This paper presents a question answering (QA) system developed for spoken lecture processing. The questions are presented to the system in written form and the answers are returned from lecture videos. In contrast to the widely studied reading comprehension style QA – the machine understands a passage of text and answers the questions related to that passage – our task introduces the challenge of searching the answers on longer text where the text corresponds to the erroneous transcripts of the lecture videos.

Categories:
49 Views

This paper presents a question answering (QA) system developed for spoken lecture processing. The questions are presented to the system in written form and the answers are returned from lecture videos. In contrast to the widely studied reading comprehension style QA – the machine understands a passage of text and answers the questions related to that passage – our task introduces the challenge of searching the answers on longer text where the text corresponds to the erroneous transcripts of the lecture videos.

Categories:
12 Views

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts.

Categories:
11 Views

Social signals such as laughter and fillers are often observed in natural conversation, and they play various roles in human-to-human communication. Detecting these events is useful for transcription systems to generate rich transcription and for dialogue systems to behave as we do such as synchronized laughing or attentive listening. We have studied an end-to-end approach to directly detect social signals from speech by using connectionist temporal classification (CTC), which is one of the end-to-end sequence labelling models.

Categories:
6 Views

Recent works have proposed neural models for dialog act classification in spoken dialogs.
However, they have not explored the role and the usefulness of acoustic information.
We propose a neural model that processes both lexical and acoustic features for classification.
Our results on two benchmark datasets reveal that acoustic features are helpful in improving the overall accuracy.

Categories:
8 Views

Pages