Sorry, you need to enable JavaScript to visit this website.

GLMB 3D SPEAKER TRACKING WITH VIDEO-ASSISTED MULTI-CHANNEL AUDIO OPTIMIZATION FUNCTIONS

Citation Author(s):
Submitted by:
xinyuan qian
Last updated:
8 April 2024 - 10:23am
Document Type:
Presentation Slides
Document Year:
2024
Event:
Presenters:
xinyuan qian
Paper Code:
MMSP-L2
 

Speaker tracking plays a significant role in numerous real-world human robot interaction (HRI) applications. In recent years, there has been a growing interest in utilizing multi-sensory information, such as complementary audio and visual signals, to address the challenges of speaker tracking. Despite the promising results, existing approaches still encounter difficulties in accurately determining the speaker’s true location, particularly in adverse conditions such as
speech pauses, reverberation, or visual occlusions, leading to missed detections or spurious estimates. In this paper, we propose a novel speaker tracking method based on the Generalized Labelled Multi- Bernoulli (GLMB) filter. Our method operates in 3D space using audio information captured by a microphone array and video streams obtained from a monocular camera. The GLMB-based tracker effectively handles outliers in location estimates and maintains tracking during periods of missed detections. Experiments conducted on the publicly available AV16.3 dataset show that our proposal surpasses other competitive methods with improved results.

up
0 users have voted: