Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

COMBINING DEEP EMBEDDINGS OF ACOUSTIC AND ARTICULATORY FEATURES FOR SPEAKER IDENTIFICATION

Abstract: 

In this study, deep embedding of acoustic and articulatory features are combined for speaker identification. First, a convolutional neural network (CNN)-based universal background model (UBM) is constructed to generate acoustic feature (AC) embedding. In addition, as the articulatory features (AFs) represent some important phonological properties during speech production, a multilayer perceptron (MLP)-based AF embedding extraction model is also constructed for AF embedding extraction. The extracted AC and AF embeddings are concatenated as a combined feature vector for speaker identification using a fully-connected neural network. This proposed system was evaluated by three corpora consisting of King-ASR, LibriSpeech and SITW, and the experiments were conducted according to the properties of the datasets. We adopted all three corpora to evaluate the effect of AF embedding, and the results showed that combining AF embedding into the input feature vector improved the performance of speaker identification. The LibriSpeech corpus was used to evaluate the effect of the number of enrolled speakers. The proposed system achieved an EER of 7.80% outperforming the method based on x-vector with PLDA (8.25%). And we further evaluated the effect of signal mismatch using the SITW corpus. The proposed system achieved an EER of 25.19%, which outperformed the other baseline methods.

up
0 users have voted:

Paper Details

Authors:
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang
Submitted On:
16 May 2020 - 12:02am
Short Link:
Type:
Presentation Slides
Event:

Document Files

20200419_ICASSP_paper2.pdf

(22)

Subscribe

[1] Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang, "COMBINING DEEP EMBEDDINGS OF ACOUSTIC AND ARTICULATORY FEATURES FOR SPEAKER IDENTIFICATION", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5366. Accessed: Jul. 05, 2020.
@article{5366-20,
url = {http://sigport.org/5366},
author = {Qian-Bei Hong; Chung-Hsien Wu; Hsin-Min Wang; Chien-Lin Huang },
publisher = {IEEE SigPort},
title = {COMBINING DEEP EMBEDDINGS OF ACOUSTIC AND ARTICULATORY FEATURES FOR SPEAKER IDENTIFICATION},
year = {2020} }
TY - EJOUR
T1 - COMBINING DEEP EMBEDDINGS OF ACOUSTIC AND ARTICULATORY FEATURES FOR SPEAKER IDENTIFICATION
AU - Qian-Bei Hong; Chung-Hsien Wu; Hsin-Min Wang; Chien-Lin Huang
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5366
ER -
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang. (2020). COMBINING DEEP EMBEDDINGS OF ACOUSTIC AND ARTICULATORY FEATURES FOR SPEAKER IDENTIFICATION. IEEE SigPort. http://sigport.org/5366
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang, 2020. COMBINING DEEP EMBEDDINGS OF ACOUSTIC AND ARTICULATORY FEATURES FOR SPEAKER IDENTIFICATION. Available at: http://sigport.org/5366.
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang. (2020). "COMBINING DEEP EMBEDDINGS OF ACOUSTIC AND ARTICULATORY FEATURES FOR SPEAKER IDENTIFICATION." Web.
1. Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang. COMBINING DEEP EMBEDDINGS OF ACOUSTIC AND ARTICULATORY FEATURES FOR SPEAKER IDENTIFICATION [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5366