Sorry, you need to enable JavaScript to visit this website.

SPEECH COLLAGE: CODE-SWITCHED AUDIO GENERATION BY COLLAGING MONOLINGUAL CORPORA

DOI:
10.60864/cwhd-2g81
Citation Author(s):
Amir Hussein , Dorsa Zeinali , Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur
Submitted by:
Dorsa Zeinali
Last updated:
6 June 2024 - 10:33am
Document Type:
Poster
Document Year:
2024
Presenters:
Dorsa Zeinali
Paper Code:
SLP-P21.9
 

Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the tran- scribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from mono- lingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zero- shot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, re- spectively. Lastly, we demonstrate that CS augmentation bolsters the model’s code-switching inclination and reduces its monolingual bias.

up
0 users have voted: