Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING

Abstract: 

Visual speech recognition (VSR), also known as lip reading is a task that recognizes words or phrases using video clips of lip movement. Traditional VSR methods are limited in that they are based mostly on VSR of frontal-view facial movement. However, for practical application, VSR should include lip movement from all angles. In this paper, we propose a pose-invariant network which can recognize words spoken from any arbitrary view input. The architecture combines convolutional neural network (CNN) with bidirectional long short-term memory (LSTM) and is trained in a multi-task manner such that the pose and the word spoken are jointly classified. Here, pose classification is considered as an auxiliary task. To comparatively evaluate the performance of the proposed multi-task learning method, the OuluVS2 benchmark dataset is used. The experimental results demonstrate that the deep model learned based on the proposed multi-task learning method achieved much better performance than models produced by previous single-view VSR methods and multi-view lip reading methods. This deep model achieved recognition performance of 95.0% accuracy on the OuluVS2 dataset.

up
0 users have voted:

Paper Details

Authors:
HouJeung Han , Sunghun Kang and Chang D. Yoo
Submitted On:
15 September 2017 - 3:48am
Short Link:
Type:
Poster
Event:
Presenter's Name:
HouJeung Han
Paper Code:
ICIP1701
Document Year:
2017
Cite

Document Files

poster

(186)

Subscribe

[1] HouJeung Han , Sunghun Kang and Chang D. Yoo, "MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/2092. Accessed: Dec. 14, 2019.
@article{2092-17,
url = {http://sigport.org/2092},
author = {HouJeung Han ; Sunghun Kang and Chang D. Yoo },
publisher = {IEEE SigPort},
title = {MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING},
year = {2017} }
TY - EJOUR
T1 - MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING
AU - HouJeung Han ; Sunghun Kang and Chang D. Yoo
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/2092
ER -
HouJeung Han , Sunghun Kang and Chang D. Yoo. (2017). MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING. IEEE SigPort. http://sigport.org/2092
HouJeung Han , Sunghun Kang and Chang D. Yoo, 2017. MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING. Available at: http://sigport.org/2092.
HouJeung Han , Sunghun Kang and Chang D. Yoo. (2017). "MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING." Web.
1. HouJeung Han , Sunghun Kang and Chang D. Yoo. MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/2092