Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

Citation Author(s):: Wei Xia, Han Lu, Quan Wang, Anshuman Tripathi, Yiling Huang, Ignacio Lopez Moreno, Hasim Sak
Submitted by:: Quan Wang
Last updated:: 5 May 2022 - 10:58am
Document Type:: Presentation Slides
Document Year:: 2022
Event:: ICASSP 2022
Presenters:: Quan Wang
Paper Code:: SPE-72.1

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

In this paper, we present a novel speaker diarization system for streaming on-device applications. In this system, we use a transformer transducer to detect the speaker turns, represent each speaker turn by a speaker embedding, then cluster these embeddings with constraints from the detected speaker turns. Compared with conventional clustering-based diarization systems, our system largely reduces the computational cost of clustering due to the sparsity of speaker turns. Unlike other supervised speaker diarization systems which require annotations of time-stamped speaker labels for training, our system only requires including speaker turn tokens during the transcribing process, which largely reduces the human efforts involved in data collection.

icassp2022_turn_to_diarize_slides.pdf

Presentation slides (161)

Links:

arXiv

Poster (157)

GitHub

Thumbs Up

CITE

Documents

Presentation Slides

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

icassp2022_turn_to_diarize_slides.pdf

QUESTIONS?