Documents
Presentation Slides
Presentation Slides
LEVERAGING EFFECTIVE LANGUAGE AND SPEAKER CONDITIONING IN INDIC TTS FOR LIMMITS 2024 CHALLENGE
- DOI:
- 10.60864/2f3q-rs08
- Citation Author(s):
- Submitted by:
- Yejin Jeon
- Last updated:
- 6 June 2024 - 10:23am
- Document Type:
- Presentation Slides
- Document Year:
- 2024
- Event:
- Presenters:
- Yejin Jeon
- Paper Code:
- GC-L4.2
- Categories:
- Log in to post comments
In this paper, we explain the model that was developed by the NLP\_POSTECH team for the LIMMITS 2024 Grand Challenge. Among the three tracks, we focus on Track 1, which necessitates the creation of a few-shot text-to-speech (TTS) system that generates natural speech across diverse languages. Towards this end, to realize multi-lingual capability, we incorporate a learnable language embedding. In addition, for precise imitation of target speaker voices, we leverage an inductive speaker bias conditioning methodology. Despite the simplicity of our strategy, our model is able to demonstrate remarkable efficacy in the generation of natural speech and preservation of high speaker fidelity for both mono and cross-lingual settings.