Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN

Abstract: 

Recovering audio-visual synchronization is an important task in the field of visual speech processing.
In this paper, we present a multi-modal regression model that uses a convolutional neural network (CNN) for recovering audio-visual synchronization of single-person speech videos. The proposed model takes audio and visual features of multiple frames as the input and predicts a drifted frame number of the audio-visual pair which we input. We treat this synchronization task as a regression problem. Thus, the model does not need to search with a sliding window which would increase the computational cost. Experimental results show that the proposed method outperforms other baseline methods for recovered accuracy and computational cost.

up
0 users have voted:

Paper Details

Authors:
Toshiki Kikuchi, Yuko Ozasa
Submitted On:
13 April 2018 - 12:19am
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Toshiki Kikuchi
Paper Code:
MMSP-L1.3
Document Year:
2018
Cite

Document Files

Presentation Slides

(248)

Subscribe

[1] Toshiki Kikuchi, Yuko Ozasa, "WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2585. Accessed: Jun. 15, 2019.
@article{2585-18,
url = {http://sigport.org/2585},
author = {Toshiki Kikuchi; Yuko Ozasa },
publisher = {IEEE SigPort},
title = {WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN},
year = {2018} }
TY - EJOUR
T1 - WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN
AU - Toshiki Kikuchi; Yuko Ozasa
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2585
ER -
Toshiki Kikuchi, Yuko Ozasa. (2018). WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN. IEEE SigPort. http://sigport.org/2585
Toshiki Kikuchi, Yuko Ozasa, 2018. WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN. Available at: http://sigport.org/2585.
Toshiki Kikuchi, Yuko Ozasa. (2018). "WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN." Web.
1. Toshiki Kikuchi, Yuko Ozasa. WATCH, LISTEN ONCE, AND SYNC: AUDIO-VISUAL SYNCHRONIZATION WITH MULTI-MODAL REGRESSION CNN [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2585