ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING

This study proposes a low-latency online speaker diarization framework.
Specifically, we design a spatial dictionary learning module shared across different frequency bands, enabling spatial feature learning at each frequency bin.
This contributes to reducing the latency constraints of the online diarization system.
Additionally, a magnitude-weighted fusion is devised to integrate spectral features. Consequently, the system can extract discriminative speaker embeddings by simultaneously considering spectral and spatial features.
Experimental results on the Alimeeting dataset demonstrate a significant improvement in diarization error rates across various latencies, with a relative improvement of 45.80\% compared to single-channel online diarization.
Moreover, our method surpasses offline direction-of-arrival-based diarization and achieves comparable performance to the second-ranked offline system of the Alimeeting challenge.

ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING.pdf

ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING.pdf (244)

Thumbs Up

CITE

Documents

Poster

ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING

ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING.pdf

QUESTIONS?