Speaker Recognition and Characterization (SPE-SPKR)

GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION

Read more about GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION
Log in to post comments

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike TE2E, the GE2E loss function updates the network in a way that emphasizes examples that are difficult to verify at each step of the training process. Additionally, the GE2E loss does not require an initial stage of example selection.

ICASSP 2018 GE2E.pptx

ICASSP 2018 GE2E.pptx (0)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)
Neural network learning (MLR-NNLR)

164 Views

A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification

Read more about A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification
Log in to post comments

A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation.

poster_weichcai_icassp2018_lde.pdf

poster_weichcai_icassp2018_lde.pdf (590)

Categories:: Multilingual Recognition and Identification (SPE-MULT)
Speaker Recognition and Characterization (SPE-SPKR)

25 Views

Insights into End-to-End Learning Scheme for Language Identification

Read more about Insights into End-to-End Learning Scheme for Language Identification
Log in to post comments

A novel interpretable end-to-end learning scheme for language identification is proposed. It is in line with the classical GMM i-vector methods both theoretically and practically. In the end-to-end pipeline, a general encoding layer is employed on top of the front-end CNN, so that it can encode the variable-length input sequence into an utterance level vector automatically. After comparing with the state-of-the-art GMM i-vector methods, we give insights into CNN, and reveal its role and effect in the whole pipeline.

poster_weichcai_icassp2018_e2e.pdf

poster_weichcai_icassp2018_e2e.pdf (550)

Categories:: Multilingual Recognition and Identification (SPE-MULT)
Speaker Recognition and Characterization (SPE-SPKR)

26 Views

DEREVERBERATION AND BEAMFORMING IN FAR-FIELD SPEAKER RECOGNITION

Read more about DEREVERBERATION AND BEAMFORMING IN FAR-FIELD SPEAKER RECOGNITION
Log in to post comments

This paper deals with far-field speaker recognition. On a corpus of NIST SRE 2010 data retransmitted in a real room with multiple microphones, we first demonstrate how room acoustics cause significant degradation of state-of-the-art i-vector based speaker recognition system. We then investigate several techniques to improve the performances ranging from probabilistic linear discriminant analysis (PLDA) re-training, through dereverberation, to beamforming.

icassp_poster_mosner.pdf

icassp_poster_mosner.pdf (548)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

53 Views

END-TO-END HIERARCHICAL LANGUAGE IDENTIFICATION SYSTEM

Read more about END-TO-END HIERARCHICAL LANGUAGE IDENTIFICATION SYSTEM
Log in to post comments

Recently, hierarchical language identification systems have shown significant improvement over single level systems in both closed and open set language identification tasks. However, developing such a system requires the features and classifier selection at each node in the hierarchical structure to be hand crafted. Motivated by the superior ability of end-to-end deep neural network architecture to jointly optimize the feature extraction and classification process, we propose a novel approach developing an end-to-end hierarchical language identification system.

ICASSP_Poster_v3_Final.pdf

Poster (551)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

38 Views

MULTISTREAM DIARIZATION FUSION USING THE MINIMUM VARIANCE BAYESIAN INFORMATION CRITERION

Poster_ICASSP_2018.pdf

Poster_ICASSP_2018.pdf (523)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

7 Views

Making Likelihood Ratios Digestible for Cross-Application Performance Assessment

Read more about Making Likelihood Ratios Digestible for Cross-Application Performance Assessment
Log in to post comments

Performance estimation is crucial to the assessment of novel algorithms and systems. In detection error trade-off (DET) diagrams, discrimination performance is solely assessed targeting one application, where cross-application performance considers risks resulting from decisions, depending on application constraints. For the purpose of interchangeability of research results across different application constraints, we propose to augment DET curves by depicting systems regarding their support of security and convenience levels.

poster.pdf

poster.pdf (608)

Categories:: Information Forensics and Security
Speaker Recognition and Characterization (SPE-SPKR)

69 Views

Speaker Diarization with LSTM

Read more about Speaker Diarization with LSTM
Log in to post comments

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of d-vector based speaker verification systems to develop a new d-vector based approach to speaker diarization.

icassp2018_diarization_poster.pdf

icassp2018_poster_quan_diarization (535)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)
Neural network learning (MLR-NNLR)

43 Views

ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION

Read more about ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Log in to post comments

Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence. In this paper, we analyze the usage of attention mechanisms to the problem of sequence summarization in our end-to-end text-dependent speaker recognition system. We explore different topologies and their variants of the attention layer, and compare different pooling methods on the attention weights.

icassp2018_poster_Reza_5.pdf

icassp2018_poster_reza_attention (776)

Categories:: Neural network learning (MLR-NNLR)
Speaker Recognition and Characterization (SPE-SPKR)

162 Views

A new approach for robust replay spoof detection in ASV systems

Read more about A new approach for robust replay spoof detection in ASV systems
Log in to post comments

The objective of this paper is to extract robust features for
detecting replay spoof attacks on text-independent speaker
verification systems. In the case of replay attacks, prere-
corded utterance of the target speaker is played to the auto-
matic speaker verification system (ASV)to gain unauthorized
access. In such a scenario, the speech signal carries the char-
acteristics of the intermediate recording device as well. In the
proposed approach, the characteristics of the intermediate de-

A new approach for robust replay spoof detection in ASV systems.pdf

A new approach for robust replay spoof detection in ASV systems.pdf (446)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

18 Views

Speaker Recognition and Characterization (SPE-SPKR)

Pages