Sorry, you need to enable JavaScript to visit this website.

LEVERAGING EFFECTIVE LANGUAGE AND SPEAKER CONDITIONING IN INDIC TTS FOR LIMMITS 2024 CHALLENGE

DOI:
10.60864/2f3q-rs08
Citation Author(s):
Youngjae Kim, Gary Geunbae Lee
Submitted by:
Yejin Jeon
Last updated:
6 June 2024 - 10:23am
Document Type:
Presentation Slides
Document Year:
2024
Event:
Presenters:
Yejin Jeon
Paper Code:
GC-L4.2
Categories:
 

In this paper, we explain the model that was developed by the NLP\_POSTECH team for the LIMMITS 2024 Grand Challenge. Among the three tracks, we focus on Track 1, which necessitates the creation of a few-shot text-to-speech (TTS) system that generates natural speech across diverse languages. Towards this end, to realize multi-lingual capability, we incorporate a learnable language embedding. In addition, for precise imitation of target speaker voices, we leverage an inductive speaker bias conditioning methodology. Despite the simplicity of our strategy, our model is able to demonstrate remarkable efficacy in the generation of natural speech and preservation of high speaker fidelity for both mono and cross-lingual settings.

up
0 users have voted: