Voice Conversion

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity. Prior studies show that it is possible to disentangle emotional prosody using an encoder-decoder network conditioned on discrete representation, such as one-hot emotion labels. Such networks learn to remember a fixed set of emotional styles.

icassp_poster.pdf

Poster (375)

icassp_slides.pdf

Slides (374)

Categories:: Audio Analysis and Synthesis
Speech Synthesis and Generation, including TTS (SPE-SYNT)

39 Views

AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms

This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq has been outstanding at numerous tasks involving sequence modeling such as speech synthesis and recognition, machine translation, and image captioning.

2019_05_ICASSP_KouTanaka.pdf

2019_05_ICASSP_KouTanaka.pdf (700)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)
Spoken Language Processing

71 Views

CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING

This paper presents a cross-lingual voice conversion approach using bilingual Phonetic PosteriorGram (PPG) and average modeling. The proposed approach makes use of bilingual PPGs to represent speaker-independent features of speech signals from different languages in the same feature space. In particular, a bilingual PPG is formed by stacking two monolingual PPG vectors, which are extracted from two monolingual speech recognition systems. The conversion model is trained to learn the relationship between bilingual PPGs and the corresponding acoustic features.

Poster_ICASSP2019.pdf

cross lingual voice conversion with bilingual PPG (478)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

70 Views