A Lightweight Hybrid Multi-Channel Speech Extraction System with Directional Voice Activity Detection

Although deep learning (DL) based end-to-end models have shown outstanding performance in multi-channel speech extraction, their practical applications on edge devices are restricted due to their high computational complexity. In this paper, we propose a hybrid system that can more effectively integrate the generalized sidelobe canceller (GSC) and a lightweight post-filtering model under the assistance of spatial speaker activity information provided by a directional voice activity detection (DVAD) module. In addition to guiding the update of the adaptive blocking matrix (ABM) and the adaptive interference canceller (AIC) used in GSC to alleviate the distortion of the desired speech, DVAD is also utilized as an auxiliary input to the postfiltering model to enhance its capability of interference suppression. The experimental results demonstrate that, with much lower computational costs, our method can achieve comparable performance with a current state-of-the-art end-to-end model on simulated data and generalize even better on real-world data.

tianchi.sun_.pptx

tianchi.sun_.pptx (429)

Links:

A Lightweight Hybrid Multi-Channel Speech Extraction System with Directional Voice Activity Detection

Thumbs Up

CITE

Documents

Presentation Slides

A Lightweight Hybrid Multi-Channel Speech Extraction System with Directional Voice Activity Detection

tianchi.sun_.pptx

QUESTIONS?