
- Read more about Text-dependent Speaker Verification and RSR2015 Speech Corpus
- Log in to post comments
RSR2015 (Robust Speaker Recognition 2015) is the largest publicly available speech corpus for text-dependent robust speaker recognition. The current release includes 151 hours of short duration utterances spoken by 300 speakers. RSR2015 is developed by the Human Language Technology (HLT) department at Institute for Infocomm Research (I2R) in Singapore. This newsletter describes RSR2015 corpus that addresses the reviving interest of text-dependent speaker recognition.
RSR2015_v2.pdf

- Categories:

- Read more about Supervised Hierarchical Clustering Using Graph Neural Networks For Speaker Diarization
- Log in to post comments
- Categories:

- Read more about AASIST: AUDIO ANTI-SPOOFING USING INTEGRATED SPECTRO-TEMPORAL GRAPH ATTENTION NETWORKS
- Log in to post comments
- Categories:

- Read more about MULTI-QUERY MULTI-HEAD ATTENTION POOLING AND INTER-TOPK PENALTY FOR SPEAKER VERIFICATION
- Log in to post comments
This paper describes the multi-query multi-head attention (MQMHA)
pooling and inter-topK penalty methods which were first proposed in
our submitted system description for VoxCeleb speaker recognition
challenge (VoxSRC) 2021. Most multi-head attention pooling mechanisms either attend to the whole feature through multiple heads or
attend to several split parts of the whole feature. Our proposed
MQMHA combines both these two mechanisms and gain more
diversified information. The margin-based softmax loss functions
- Categories:

- Read more about RawNeXt: Speaker verification system for variable-duration utterances with deep layer aggregation and extended dynamic scaling policies
- Log in to post comments
Despite achieving satisfactory performance in speaker verification using deep neural networks, variable-duration utterances remain a challenge that threatens the robustness of systems. To deal with this issue, we propose a speaker verification system called RawNeXt that can handle input raw waveforms of arbitrary length by employing the following two components: (1) A deep layer aggregation strategy enhances speaker information by iteratively and hierarchically aggregating features of various time scales and spectral channels output from blocks.
- Categories:

- Read more about GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
- Log in to post comments
The objective of this paper is to combine multiple frame-level features into a single utterance-level representation considering pair wise relationships. For this purpose, we propose a novel graph attentive feature aggregation module by interpreting each frame-level feature as a node of a graph. The inter-relationship between all possible pairs of features, typically exploited indirectly, can be directly modeled using a graph. The module comprises a graph attention layer and a graph pooling layer followed by a readout operation.
- Categories:

- Read more about Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data
- Log in to post comments
- Categories: