On reducing the effect of speaker overlap for CHiME-5
- Citation Author(s):
- Submitted by:
- Tudor-Catalin Zorila
- Last updated:
- 11 May 2019 - 8:58pm
- Document Type:
- Document Year:
- Catalin Zorila
- Paper Code:
The CHiME-5 speech separation and recognition challenge was recently shown to pose a difficult task for the current automatic speech recognition systems.
Speaker overlap was one of the main difficulties of the challenge. The presence of noise, reverberation and the moving speakers have made the traditional source separation methods ineffective in improving the recognition accuracy.
In this paper we have explored several enhancement strategies aimed to reduce the effect of speaker overlap for CHiME-5 without performing source separation.
One is based on discarding the overlap segments using the speaker diarisation information from the challenge, another one is a neural network driven automatic gain control enhancement aimed to improve the previous speaker diarisation information, and the last one is based on optimal multi-array data selection. State-of-the-art acoustic models were used to perform the ASR experiments.
Results have shown that proposed automatic gain control method yields word error rate (WER) reductions between 2% and 3% absolute on the development set of CHiME-5.