Speaker Recognition and Characterization (SPE-SPKR)

DEEP SPEAKER REPRESENTATION USING ORTHOGONAL DECOMPOSITION AND RECOMBINATION FOR SPEAKER VERIFICATION

Speech signal contains intrinsic and extrinsic variations such as accent, emotion, dialect, phoneme, speaking manner, noise, music, and reverberation. Some of these variations are unnecessary and are unspecified factors of variation. These factors lead to increased variability in speaker representation. In this paper, we assume that unspecified factors of variation exist in speaker representations, and we attempt to minimize variability in speaker representation.

Poster_InsooKim.pdf

Poster_InsooKim.pdf (754)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

567 Views

Speaker Diarisation Using 2D Self-attentive Combination of Embeddings

Read more about Speaker Diarisation Using 2D Self-attentive Combination of Embeddings
Log in to post comments

Speaker diarisation systems often cluster audio segments using speaker embeddings such as i-vectors and d-vectors. Since different types of embeddings are often complementary, this paper proposes a generic framework to improve performance by combining them into a single embedding, referred to as a c-vector. This combination uses a 2-dimensional (2D) self-attentive structure, which extends the standard self-attentive layer by averaging not only across time but also across different types of embeddings.

DiarisationPresentation3.pdf

DiarisationPresentation3.pdf (353)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

13 Views

ATTENTIVE FILTERING NETWORKS FOR AUDIO REPLAY ATTACK DETECTION

Read more about ATTENTIVE FILTERING NETWORKS FOR AUDIO REPLAY ATTACK DETECTION
Log in to post comments

An attacker may use a variety of techniques to fool an automatic speaker verification system into accepting them as a genuine user. Anti-spoofing methods meanwhile aim to make the system robust against such attacks. The ASVspoof 2017 Challenge focused specifically on replay attacks, with the intention of measuring the limits of replay attack detection as well as developing countermeasures against them.

ICASSP poster (1).pdf

ICASSP poster (1).pdf (306)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

10 Views

A DENOISING AUTOENCODER FOR SPEAKER RECOGNITION. RESULTS ON THE MCE 2018 CHALLENGE

Read more about A DENOISING AUTOENCODER FOR SPEAKER RECOGNITION. RESULTS ON THE MCE 2018 CHALLENGE
Log in to post comments

We propose a Denoising Autoencoder (DAE) for speaker recognition, trained to map each individual ivector to the mean of all ivectors belonging to that particular speaker. The aim of this DAE is to compensate for inter-session variability and increase the discriminative power of the ivectors prior to PLDA scoring. We test the proposed approach on the MCE 2018 1st Multi-target speaker detection and identification Challenge Evaluation. This evaluation presents a call-center fraud detection scenario: given a speech segment, detect if it belongs to any of the speakers in a blacklist.

ICASSP-2019_RobertoFont.pdf

ICASSP-2019_RobertoFont.pdf (401)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

57 Views

Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: (1) a hybrid neural network structure using both time delay neural network (TDNN) and long short-term memory neural networks (LSTM) to generate complementary speaker information at different levels; (2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; (3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step.

ICASSP2019_poster.pdf

ICASSP2019_poster.pdf (362)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

9 Views

PRE-TRAINING OF SPEAKER EMBEDDINGS FOR LOW-LATENCY SPEAKER CHANGE DETECTION IN BROADCAST NEWS

In this work, we investigate pre-training of neural network based speaker embeddings for low-latency speaker change detection. Our proposed system takes two speech segments, generates embeddings using shared Siamese layers and then classifies the concatenated embeddings depending on whether they are spoken by the same speaker. We investigate gender classification, contrastive loss and triplet loss based pre-training of the embedding layers and also joint training of the embedding layers along with a same/different classifier.

poster_final_ledaSari.pdf

poster_final_ledaSari.pdf (366)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

47 Views

FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH

Read more about FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH
Log in to post comments

In this work, we propose a new feature based on formants for whispered speaker verification (SV) task, where neutral data is used for enrollment and whispered recordings are used for test. Such a mismatch between enrollment and test often degrades the performance of whispered SV systems due to the difference in acoustic characteristics of whispered and neutral speech. We hypothesize that the proposed formant and formant gap (F oG) features are more invariant to the modes of speech in capturing speaker specific information

wsv_poster.pdf

wsv_poster.pdf (330)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

13 Views

Fully Supervised Speaker Diarization

Read more about Fully Supervised Speaker Diarization
Log in to post comments

In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a. d-vectors) from input utterances, each individual speaker is modeled by a parameter-sharing RNN, while the RNN states for different speakers interleave in the time domain. This RNN is naturally integrated with a distance-dependent Chinese restaurant process (ddCRP) to accommodate an unknown number of speakers.

icassp2019_supervised_diarization_poster.pdf

icassp2019_supervised_diarization_poster.pdf (528)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

103 Views

Sufficiency quantification for seamless text-independent speaker enrollment

Read more about Sufficiency quantification for seamless text-independent speaker enrollment
Log in to post comments

Text-independent speaker recognition (TI-SR) requires a lengthy enrollment process that involves asking dedicated time from the user to create a reliable model of their voice. Seamless enrollment is a highly attractive feature which refers to the enrollment process that happens in the background and asks for no dedicated time from the user. One of the key problems in a fully automated seamless enrollment process is to determine the sufficiency of a given utterance collection for the purpose of TI-SR. No known metric exists in the literature to quantify sufficiency.

ICASSP2018_poster_Cilingir.pdf

Poster presented at ICASSP 2018 (669)

SufficiencyMetric_ICASP2018_Cilingir.pdf

Paper for ICASSP 2018 (684)

Categories:: Audio and Acoustic Signal Processing
Speaker Recognition and Characterization (SPE-SPKR)

16 Views

Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition

The i-vector approach to speaker recognition has achieved good performance when the domain of the evaluation dataset is similar to that of the training dataset. However, in real-world applications, there is always a mismatch between the training and evaluation datasets, that leads to performance degradation. To address this problem, this paper proposes to learn the domain-invariant and speaker-discriminative speech representations via domain adversarial training.

icassp2018_slides_qingwang_Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition.pdf

icassp2018_slides_qingwang_Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition.pdf (546)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

79 Views

Speaker Recognition and Characterization (SPE-SPKR)

Pages