Sorry, you need to enable JavaScript to visit this website.

Multi-Task Joint-Learning for Robust Voice Activity Detection

Citation Author(s):
Yanmin Qian, Kai Yu
Submitted by:
Yimeng Zhuang
Last updated:
15 October 2016 - 3:51am
Document Type:
Presentation Slides
Document Year:
2016
Event:
Presenters:
Yimeng Zhuang
Paper Code:
35
 

Model based VAD approaches have been widely used and
achieved success in practice. These approaches usually cast
VAD as a frame-level classification problem and employ statistical
classifiers, such as Gaussian Mixture Model (GMM) or
Deep Neural Network (DNN) to assign a speech/silence label
for each frame. Due to the frame independent assumption classification,
the VAD results tend to be fragile. To address this
problem, in this paper, a new structured multi-frame prediction
DNN approach is proposed to improve the segment-level
VAD performance. During DNN training, VAD labels of multiple
consecutive frames are concatenated together as targets and
jointly trained with a speech enhancement task to achieve robustness
under noisy conditions. During testing, the VAD label
for each frame is obtained by merging the prediction results
from neighbouring frames. Experiments on an Aurora 4
dataset showed that, conventional DNN based VAD has poor
and unstable prediction performance while the proposed multitask
trained VAD is much more robust.

up
0 users have voted: