
- Read more about Multitask Learning with Capsule Networks for Speech-to-Intent Applications
- Log in to post comments
Voice controlled applications can be a great aid to society, especially for physically challenged people. However this requires robustness to all kinds of variations in speech. A spoken language understanding system that learns from interaction with and demonstrations from the user, allows the use of such a system in different settings and for different types of speech, even for deviant or impaired speech, while also allowing the user to choose a phrasing.
- Categories:

- Read more about Multimodal One-shot Learning of Speech and Images
- Log in to post comments
Image a robot is shown new concepts visually together with spoken tags, e.g. "milk", "eggs", "butter". After seeing one paired audiovisual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the "milk". Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning.
- Categories:

- Read more about Context-aware Neural-based Dialog Act Classification On Automatically Generated Transcriptions
- Log in to post comments
This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions. We propose a novel approach that combines convolutional neural networks (CNNs) and conditional random fields (CRFs) for context modeling in DA classification. We explore the impact of transcriptions generated from different automatic speech recognition systems such as hybrid TDNN/HMM and End-to-End systems on the final performance. Experimental results on two benchmark datasets (MRDA and SwDA) show that the combination CNN and CRF improves consistently the accuracy.
- Categories:

- Read more about QUESTION ANSWERING FOR SPOKEN LECTURE PROCESSING
- Log in to post comments
This paper presents a question answering (QA) system developed for spoken lecture processing. The questions are presented to the system in written form and the answers are returned from lecture videos. In contrast to the widely studied reading comprehension style QA – the machine understands a passage of text and answers the questions related to that passage – our task introduces the challenge of searching the answers on longer text where the text corresponds to the erroneous transcripts of the lecture videos.
- Categories:

- Read more about QUESTION ANSWERING FOR SPOKEN LECTURE PROCESSING
- Log in to post comments
This paper presents a question answering (QA) system developed for spoken lecture processing. The questions are presented to the system in written form and the answers are returned from lecture videos. In contrast to the widely studied reading comprehension style QA – the machine understands a passage of text and answers the questions related to that passage – our task introduces the challenge of searching the answers on longer text where the text corresponds to the erroneous transcripts of the lecture videos.
- Categories:

- Read more about REVISITING HIDDEN MARKOV MODELS FOR SPEECH EMOTION RECOGNITION
- Log in to post comments
- Categories:

- Read more about USING DEEP-Q NETWORK TO SELECT CANDIDATES FROM N-BEST SPEECH RECOGNITION HYPOTHESES FOR ENHANCING DIALOGUE STATE TRACKING
- Log in to post comments
- Categories:

- Read more about Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning
- Log in to post comments
This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts.
- Categories:

- Read more about AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION
- Log in to post comments
Social signals such as laughter and fillers are often observed in natural conversation, and they play various roles in human-to-human communication. Detecting these events is useful for transcription systems to generate rich transcription and for dialogue systems to behave as we do such as synchronized laughing or attentive listening. We have studied an end-to-end approach to directly detect social signals from speech by using connectionist temporal classification (CTC), which is one of the end-to-end sequence labelling models.
- Categories:

- Read more about Incorporating ASR Errors with Attention-based, Jointly Trained RNN for Intent Detection and Slot Filling
- Log in to post comments
- Categories: