Frame-based Overlapping Speech Detection using Convolutional Neural Networks

Citation Author(s):: Midia Yousefi

Midia Yousefi, Hohn H.L. Hansen
Submitted by:: Midia Yousefi
Last updated:: 15 May 2020 - 6:57pm
Document Type:: Presentation Slides
Document Year:: 2020
Event:: ICASSP 2020
Presenters:: Midia Yousefi
Paper Code:: 4733

Categories:: Speech Coding (SPE-CODI)

Naturalistic speech recordings usually contain speech signals from multiple speakers. This phenomenon can degrade the performance of speech technologies due to the complexity of tracing and recognizing individual speakers. In this study, we investigate the detection of overlapping speech on segments as short as 25 ms using Convolutional Neural Networks. We evaluate the detection performance using different spectral features, and show that pyknogram features outperforms other commonly used speech features. The proposed system can predict overlapping speech with an accuracy of 84% and Fscore of 88% on a dataset of mixed speech generated based on the GRID dataset.

ICASSP2020-overlap-detection_MY-JH-Mar30-2020.pdf

ICASSP2020-overlap-detection_MY-JH-Mar30-2020.pdf (373)

Thumbs Up

CITE

Documents

Presentation Slides

Frame-based Overlapping Speech Detection using Convolutional Neural Networks

ICASSP2020-overlap-detection_MY-JH-Mar30-2020.pdf

QUESTIONS?