Documents
Poster
SPEECH COLLAGE: CODE-SWITCHED AUDIO GENERATION BY COLLAGING MONOLINGUAL CORPORA
- DOI:
- 10.60864/cwhd-2g81
- Citation Author(s):
- Submitted by:
- Dorsa Zeinali
- Last updated:
- 6 June 2024 - 10:33am
- Document Type:
- Poster
- Document Year:
- 2024
- Presenters:
- Dorsa Zeinali
- Paper Code:
- SLP-P21.9
- Categories:
- Log in to post comments
Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the tran- scribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from mono- lingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zero- shot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, re- spectively. Lastly, we demonstrate that CS augmentation bolsters the model’s code-switching inclination and reduces its monolingual bias.