Speaker Recognition and Characterization (SPE-SPKR)

A Memory Augmented Architecture For Continuous Speaker Identification In Meetings

Read more about A Memory Augmented Architecture For Continuous Speaker Identification In Meetings
Log in to post comments

We introduce and analyze a novel approach to the problem of speaker identification in multi-party recorded meetings. Given a speech segment and a set of available candidate profiles, a data-driven approach is proposed learning the distance relations between them, aiming at identifying the correct speaker label corresponding to that segment. A recurrent, memory-based architecture is employed, since this class of neural networks has been shown to yield improved performance in problems requiring relational reasoning.

2020_ICASSP_RMC_MSR_pres.pdf

2020_ICASSP_RMC_MSR_pres.pdf (246)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

34 Views

Speaker Diarization with Session-level Speaker Embedding Refinement using Graph Neural Networks

Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be sub-optimal for distinguishing speakers locally in a specific meeting session. In this work we present the first use of graph neural networks (GNNs) for the speaker diarization problem, utilizing a GNN to refine speaker embeddings locally using the structural information between speech segments inside each session.

icassp2020_slides.pdf

Slides for the paper "Speaker Diarization with Session-level Speaker Embedding Refinement using Graph Neural Networks" (552)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

35 Views

An ensemble Based Approach for Generalized Detection of Spoofing Attacks to Automatic Speaker Recognizers

As automatic speaker recognizer systems become mainstream, voice spoofing attacks are on the rise. Common attack strategies include replay, the use of text-to-speech synthesis, and voice conversion systems. While previously-proposed end-to-end detection frameworks have shown to be effective in spotting attacks for one particular spoofing strategy, they have relied on different models, architectures, and speech representations, depending on the spoofing strategy.

ICASSP_Spoofing.pdf

ICASSP_Spoofing.pdf (353)

Categories:: Neural network learning (MLR-NNLR)
Pattern recognition and classification (MLR-PATT)
Speaker Recognition and Characterization (SPE-SPKR)

34 Views

Meta Learning for Robust Child/Adult Classification from Speech

Read more about Meta Learning for Robust Child/Adult Classification from Speech
Log in to post comments

Computational modeling of naturalistic conversations in clinical applications has seen growing interest in the past decade. An important use-case involves child-adult interactions within the autism diagnosis and intervention domain. In this paper, we address a specific sub-problem of speaker diarization, namely child-adult speaker classification in such dyadic conversations with specified roles. Training a speaker classification system robust to speaker and channel conditions is challenging due to inherent variability in the speech within children and the adult interlocutors.

metaLearning_slides.pdf

metaLearning_slides.pdf (329)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

82 Views

Robust speaker recognition using unsupervised adversarial invariance

Read more about Robust speaker recognition using unsupervised adversarial invariance
Log in to post comments

In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt a recently proposed unsupervised adversarial invariance architecture to train a network that maps speaker embeddings extracted using a pre-trained model onto two lower dimensional embedding spaces. The embedding spaces are learnt to disentangle speaker-discriminative information from all other information present in the audio recordings, without supervision about the acoustic conditions.

Raghuveer_Peri_ICASSP2020.pdf

Raghuveer_Peri_ICASSP2020.pdf (266)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

54 Views

Robust speaker recognition using unsupervised adversarial invariance

Read more about Robust speaker recognition using unsupervised adversarial invariance
Log in to post comments

Raghuveer_Peri_ICASSP2020.pdf

Raghuveer_Peri_ICASSP2020.pdf (308)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

17 Views

AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES

This paper presents an improved deep embedding learning method based on a convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) a multiscale convolution (MSCNN) is adopted in the frame-level layers to capture the complementary speaker information in different receptive fields; (2) a Baum-Welch statistics attention (BWSA) mechanism is applied in the pooling layer, which can integrate more useful long-term speaker characteristics in the temporal pooling layer.

ICASSP2020_poster_BinGu.pdf

ICASSP2020_poster_BinGu.pdf (290)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

23 Views

text-independent speaker verfication with adversarial learning on short utterances

Read more about text-independent speaker verfication with adversarial learning on short utterances
Log in to post comments

short_utt.pdf

ICASSP paper #5822 (372)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

83 Views

End-to-end Detection of Attacks to Automatic Speaker Recognizers with Time-attentive Light Convolutional Neural Networks

In this contribution, we introduce convolutional neural network architectures aiming at performing end-to-end detection of attacks to voice biometrics systems, i.e. the model provides scores corresponding to the likelihood of attack given general purpose time-frequency features obtained from speech. Microphone level attackers based on speech synthesis and voice conversion techniques are considered, along with presentation replay attacks.

MLSP_E2E_SpoofingDetection.pdf

MLSP_E2E_SpoofingDetection.pdf (443)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

101 Views

Importance of analytic phase of the speech signal for detecting Replay attacks in automatic speaker verification systems

In this paper, the importance of analytic phase of the speech signal in automatic speaker verification systems is demonstrated in the context of replay spoof attacks. In order to accurately detect the replay spoof attacks, effective feature representations of speech signals are required to capture the distortion introduced due to the intermediate playback/recording devices, which is convolutive in nature.

ICASSP2019_ASVspoof_ifc_poster.pdf

Poster of ICASSP19 (434)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

85 Views

Speaker Recognition and Characterization (SPE-SPKR)

Pages