Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

Abstract: 

This paper presents a multi-channel/multi-speaker 3D audiovisual
corpus for Mandarin continuous speech recognition and
other fields, such as speech visualization and speech synthesis.
This corpus consists of 24 speakers with about 18k utterances,
about 20 hours in total. For each utterance, the audio
streams were recorded by two professional microphones in
near-field and far-field respectively, while a marker-based 3D
facial motion capturing system with six infrared cameras was
used to acquire the 3D video streams. Besides, the corresponding
2D video streams were captured by an additional camera as
a supplement. A data process is described in this paper for synchronizing
audio and video streams, detecting and correcting
outliers, and removing head motions during recording. Finally,
results about data process are also discussed. As so far, this
corpus is the largest 3D audio-visual corpus for Mandarin.

up
0 users have voted:

Paper Details

Authors:
Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou
Submitted On:
14 October 2016 - 10:40am
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Rongfeng Su
Paper Code:
131
Document Year:
2016
Cite

Document Files

3D Audio-Visual Speech Corpus in Mandarin

(569)

Subscribe

[1] Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou, "A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1200. Accessed: Aug. 25, 2019.
@article{1200-16,
url = {http://sigport.org/1200},
author = {Jun Yu; Rongfeng Su; Lan Wang; Wenpeng Zhou },
publisher = {IEEE SigPort},
title = {A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin},
year = {2016} }
TY - EJOUR
T1 - A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin
AU - Jun Yu; Rongfeng Su; Lan Wang; Wenpeng Zhou
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1200
ER -
Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou. (2016). A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin. IEEE SigPort. http://sigport.org/1200
Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou, 2016. A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin. Available at: http://sigport.org/1200.
Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou. (2016). "A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin." Web.
1. Jun Yu, Rongfeng Su, Lan Wang, Wenpeng Zhou. A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1200