Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION

Abstract: 

Social signals such as laughter and fillers are often observed in natural conversation, and they play various roles in human-to-human communication. Detecting these events is useful for transcription systems to generate rich transcription and for dialogue systems to behave as we do such as synchronized laughing or attentive listening. We have studied an end-to-end approach to directly detect social signals from speech by using connectionist temporal classification (CTC), which is one of the end-to-end sequence labelling models. In this work, we propose a unified framework that integrates social signal detection (SSD) and automatic speech recognition (ASR). We investigate several reference labelling methods regarding social signals. Experimental evaluations demonstrate that our end-to-end framework significantly outperforms the conventional DNN-HMM system with regard to SSD performance as well as the character error rate (CER).

up
0 users have voted:

Paper Details

Authors:
Hirofumi Inaguma, Masato Mimura, Koji Inoue, Kazuyoshi Yoshii, Tatsuya Kawahara
Submitted On:
17 April 2018 - 7:49pm
Short Link:
Type:
Poster
Event:
Presenter's Name:
Hirofumi Inaguma
Paper Code:
HLT-P3.8
Document Year:
2018
Cite

Document Files

201804_ICASSP2018_poster.pdf

(89 downloads)

Subscribe

[1] Hirofumi Inaguma, Masato Mimura, Koji Inoue, Kazuyoshi Yoshii, Tatsuya Kawahara, "AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2947. Accessed: Oct. 15, 2018.
@article{2947-18,
url = {http://sigport.org/2947},
author = {Hirofumi Inaguma; Masato Mimura; Koji Inoue; Kazuyoshi Yoshii; Tatsuya Kawahara },
publisher = {IEEE SigPort},
title = {AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION
AU - Hirofumi Inaguma; Masato Mimura; Koji Inoue; Kazuyoshi Yoshii; Tatsuya Kawahara
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2947
ER -
Hirofumi Inaguma, Masato Mimura, Koji Inoue, Kazuyoshi Yoshii, Tatsuya Kawahara. (2018). AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/2947
Hirofumi Inaguma, Masato Mimura, Koji Inoue, Kazuyoshi Yoshii, Tatsuya Kawahara, 2018. AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION. Available at: http://sigport.org/2947.
Hirofumi Inaguma, Masato Mimura, Koji Inoue, Kazuyoshi Yoshii, Tatsuya Kawahara. (2018). "AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION." Web.
1. Hirofumi Inaguma, Masato Mimura, Koji Inoue, Kazuyoshi Yoshii, Tatsuya Kawahara. AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2947