Sorry, you need to enable JavaScript to visit this website.

ICASSP 2019 SLIDES: SPEAKER CHANGE DETECTION USING FUNDAMENTAL FREQUENCY WITH APPLICATION TO MULTI-TALKER SEGMENTATION

Citation Author(s):
Aidan O. T. Hogg, Christine Evers, Patrick A. Naylor
Submitted by:
Aidan Hogg
Last updated:
27 May 2019 - 10:54am
Document Type:
Presentation Slides
Document Year:
2019
Event:
Presenters:
Aidan O. T. Hogg
Paper Code:
2326
 

This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual’s pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This system is then evaluated against a commonly used MFCC segmentation system. The proposed system is shown to increase the speaker change detection rate from 43.3% to 70.5% on meetings in the AMI corpus. Therefore, there are two equally weighted contributions in this paper: 1. We address the question of whether a change in pitch is a reliable estimator of a speaker change in multi-talk meeting audio. 2. We develop a method to extract such speaker changes and test them on a widely available meeting corpus.

up
0 users have voted: