Sorry, you need to enable JavaScript to visit this website.

FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH

Citation Author(s):
Abinay Reddy Naini, Achuth Rao MV, Prasanta Kumar Ghosh
Submitted by:
Abinay Naini
Last updated:
8 May 2019 - 5:23am
Document Type:
Poster
Document Year:
2019
Event:
Presenters:
Abinay Reddy Naini
Paper Code:
4776
 

In this work, we propose a new feature based on formants for whispered speaker verification (SV) task, where neutral data is used for enrollment and whispered recordings are used for test. Such a mismatch between enrollment and test often degrades the performance of whispered SV systems due to the difference in acoustic characteristics of whispered and neutral speech. We hypothesize that the proposed formant and formant gap (F oG) features are more invariant to the modes of speech in capturing speaker specific information
compared to traditional baseline features for SV including mel frequency cepstral coefficients (MFCC) and auditory-inspired amplitude modulation features (AAMF). Whispered SV experiments with 714 speakers comprising 29232 neutral and 22932 whispered recordings reveal that the equal error rate (EER) using the proposed features is lower than that using the best baseline features by ∼3.79% (absolute). It was also observed that at least four whispered recordings during enrollment are required for the baseline features to perform at par with the proposed features. However, it was found that the best performing baseline features yield an EER for neutral SV
task which is ∼1.88% higher than that using the proposed features.

up
0 users have voted: