Sorry, you need to enable JavaScript to visit this website.

This paper focuses on the transition of automatic speaker verification systems from time delay neural networks (TDNN) to ResNet-based networks. TDNN-based systems use a statistics pooling layer to aggregate temporal information which is suitable for two-dimensional tensors. Even though ResNet-based models produce three-dimensional tensors, they continue to incorporate the statistics pooling layer.

Categories:
19 Views

An automatic, text-independent speaker verification (SV) system is proposed using Line Spectral Frequency (LSF) features. The state-of-the-art Gaussian Mixture Model with Universal Background Model (GMM-UBM) framework is used for speaker modeling and verification. A score-level fusion based technique is employed in order to extract complementary information from static and dynamic LSF features and improve the noise-robustness of the SV system. In addition, the speaker-discriminative power of different speech zones such as vowels, non-vowels, and transitions are investigated.

5 Views