Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Multimodal active speaker detection and virtual cinematography for video conferencing

Abstract: 

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer’s video significantly higher than unedited video. We describe a new automated ASD and VC that performs within 0.3 MOS of an expert cinematographer based on subjective ratings with a 1-5 scale. This system uses a 4K wide-FOV camera, a depth camera, and a microphone array; it extracts features from each modality and trains an ASD using an AdaBoost machine learning system that is very efficient and runs in real-time. A VC is similarly trained using machine learning to optimize the subjective quality of the overall experience. To avoid distracting the room participants and reduce switching latency the system has no moving parts – the VC works by cropping and zooming the 4K wide-FOV video stream. The system was tuned and evaluated using extensive crowdsourcing techniques and evaluated on a dataset with N=100 meetings, each 2-5 minutes in length.

up
0 users have voted:

Paper Details

Authors:
Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle
Submitted On:
12 February 2020 - 12:55am
Short Link:
Type:
Research Manuscript
Event:
Presenter's Name:
Ross Cutler
Paper Code:
5035
Document Year:
2020
Cite

Document Files

ICASSP 2020 ASD.pdf

(107)

Subscribe

[1] Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle, "Multimodal active speaker detection and virtual cinematography for video conferencing", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/4980. Accessed: Dec. 02, 2020.
@article{4980-20,
url = {http://sigport.org/4980},
author = {Ross Cutler; Ramin Mehran; Sam Johnson; Cha Zhang; Adam Kirk; Oliver Whyte; Adarsh Kowdle },
publisher = {IEEE SigPort},
title = {Multimodal active speaker detection and virtual cinematography for video conferencing},
year = {2020} }
TY - EJOUR
T1 - Multimodal active speaker detection and virtual cinematography for video conferencing
AU - Ross Cutler; Ramin Mehran; Sam Johnson; Cha Zhang; Adam Kirk; Oliver Whyte; Adarsh Kowdle
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/4980
ER -
Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle. (2020). Multimodal active speaker detection and virtual cinematography for video conferencing. IEEE SigPort. http://sigport.org/4980
Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle, 2020. Multimodal active speaker detection and virtual cinematography for video conferencing. Available at: http://sigport.org/4980.
Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle. (2020). "Multimodal active speaker detection and virtual cinematography for video conferencing." Web.
1. Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle. Multimodal active speaker detection and virtual cinematography for video conferencing [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/4980