Sorry, you need to enable JavaScript to visit this website.

Speech Activity Detection in Online Broadcast Transcription Using Deep Neural Networks and Weighted Finite State Transducers

Citation Author(s):
Lukas Mateju, Petr Cerva, Jindrich Zdansky, Jiri Malek
Submitted by:
Lukas Mateju
Last updated:
28 February 2017 - 5:04am
Document Type:
Poster
Document Year:
2017
Event:
Presenters:
Lukas Mateju
Paper Code:
ICASSP1701
 

A new approach to online Speech Activity Detection (SAD) is proposed. This approach is designed for the use in a system that carries out 24/7 transcription of radio/TV broadcasts containing a large amount of non-speech segments. To improve the robustness of detection, we adopt Deep Neural Networks (DNNs) trained on artificially created mixtures of speech and non-speech signals at desired levels of Signal-to-Noise Ratio (SNR). An integral part of our approach is an online decoder based on Weighted Finite State Transducers (WFSTs); this decoder smooths the output from DNN. The employed transduction model is context-based, i.e., both speech and non-speech events are modeled using sequences of states. The presented experimental results show that our approach yields state-of-the-art results on standardized QUT-NOISE-TIMIT data set and, at the same time, it is capable of a) operating with low latency and b) reducing the computational demands and error rate of the target transcription system.

up
0 users have voted: