Sorry, you need to enable JavaScript to visit this website.

Speech Synthesis and Generation, including TTS (SPE-SYNT)

AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms


This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq has been outstanding at numerous tasks involving sequence modeling such as speech synthesis and recognition, machine translation, and image captioning.

Paper Details

Authors:
Submitted On:
15 May 2019 - 7:03am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

2019_05_ICASSP_KouTanaka.pdf

(47)

Subscribe

[1] , "AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4522. Accessed: Sep. 20, 2019.
@article{4522-19,
url = {http://sigport.org/4522},
author = { },
publisher = {IEEE SigPort},
title = {AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms},
year = {2019} }
TY - EJOUR
T1 - AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms
AU -
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4522
ER -
. (2019). AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms. IEEE SigPort. http://sigport.org/4522
, 2019. AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms. Available at: http://sigport.org/4522.
. (2019). "AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms." Web.
1. . AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4522

An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster


The generalized command response (GCR) model represents intonation as a
superposition of muscle responses to spike command signals. We have previously
shown that the spikes can be predicted by a two-stage system, consisting of a recurrent neural network and a post-processing procedure, but the responses themselves were fixed dictionary atoms. We propose an end-to-end
neural architecture that replaces the dictionary atoms with trainable
second-order recurrent elements analogous to recursive filters. We demonstrate

Paper Details

Authors:
François Marelli, Bastian Schnell, Hervé Bourlard, Thierry Dutoit, Philip N. Garner
Submitted On:
10 May 2019 - 11:54am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Presentation Poster

(41)

Subscribe

[1] François Marelli, Bastian Schnell, Hervé Bourlard, Thierry Dutoit, Philip N. Garner, "An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4367. Accessed: Sep. 20, 2019.
@article{4367-19,
url = {http://sigport.org/4367},
author = {François Marelli; Bastian Schnell; Hervé Bourlard; Thierry Dutoit; Philip N. Garner },
publisher = {IEEE SigPort},
title = {An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster},
year = {2019} }
TY - EJOUR
T1 - An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster
AU - François Marelli; Bastian Schnell; Hervé Bourlard; Thierry Dutoit; Philip N. Garner
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4367
ER -
François Marelli, Bastian Schnell, Hervé Bourlard, Thierry Dutoit, Philip N. Garner. (2019). An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster. IEEE SigPort. http://sigport.org/4367
François Marelli, Bastian Schnell, Hervé Bourlard, Thierry Dutoit, Philip N. Garner, 2019. An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster. Available at: http://sigport.org/4367.
François Marelli, Bastian Schnell, Hervé Bourlard, Thierry Dutoit, Philip N. Garner. (2019). "An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster." Web.
1. François Marelli, Bastian Schnell, Hervé Bourlard, Thierry Dutoit, Philip N. Garner. An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4367

Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features


This paper examines four approaches to improving real-time neural vocoders with simple acoustic features (SAF) constructed from fundamental frequency and mel-cepstra rather than mel-spectrograms.

Paper Details

Authors:
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Submitted On:
10 May 2019 - 9:39pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp_2019_okamoto_1.pdf

(57)

Subscribe

[1] Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, "Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4280. Accessed: Sep. 20, 2019.
@article{4280-19,
url = {http://sigport.org/4280},
author = {Takuma Okamoto; Tomoki Toda; Yoshinori Shiga; Hisashi Kawai },
publisher = {IEEE SigPort},
title = {Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features},
year = {2019} }
TY - EJOUR
T1 - Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features
AU - Takuma Okamoto; Tomoki Toda; Yoshinori Shiga; Hisashi Kawai
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4280
ER -
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. (2019). Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features. IEEE SigPort. http://sigport.org/4280
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 2019. Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features. Available at: http://sigport.org/4280.
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. (2019). "Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features." Web.
1. Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4280

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion


Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time alignment procedures. However, there is still a large gap between the real target and converted speech, and bridging this gap remains a challenge.

Paper Details

Authors:
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Submitted On:
10 May 2019 - 2:59am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Kaneko_CycleGAN-VC2_ICASSP_2019_poster.pdf

(31)

Subscribe

[1] Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, "CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4278. Accessed: Sep. 20, 2019.
@article{4278-19,
url = {http://sigport.org/4278},
author = {Takuhiro Kaneko; Hirokazu Kameoka; Kou Tanaka; Nobukatsu Hojo },
publisher = {IEEE SigPort},
title = {CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion},
year = {2019} }
TY - EJOUR
T1 - CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
AU - Takuhiro Kaneko; Hirokazu Kameoka; Kou Tanaka; Nobukatsu Hojo
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4278
ER -
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo. (2019). CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. IEEE SigPort. http://sigport.org/4278
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 2019. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Available at: http://sigport.org/4278.
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo. (2019). "CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion." Web.
1. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4278

CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING


This paper presents a cross-lingual voice conversion approach using bilingual Phonetic PosteriorGram (PPG) and average modeling. The proposed approach makes use of bilingual PPGs to represent speaker-independent features of speech signals from different languages in the same feature space. In particular, a bilingual PPG is formed by stacking two monolingual PPG vectors, which are extracted from two monolingual speech recognition systems. The conversion model is trained to learn the relationship between bilingual PPGs and the corresponding acoustic features.

Paper Details

Authors:
Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das and Haizhou Li
Submitted On:
9 May 2019 - 3:46am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cross lingual voice conversion with bilingual PPG

(34)

Subscribe

[1] Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das and Haizhou Li, "CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4165. Accessed: Sep. 20, 2019.
@article{4165-19,
url = {http://sigport.org/4165},
author = {Yi Zhou; Xiaohai Tian; Haihua Xu; Rohan Kumar Das and Haizhou Li },
publisher = {IEEE SigPort},
title = {CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING},
year = {2019} }
TY - EJOUR
T1 - CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING
AU - Yi Zhou; Xiaohai Tian; Haihua Xu; Rohan Kumar Das and Haizhou Li
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4165
ER -
Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das and Haizhou Li. (2019). CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING. IEEE SigPort. http://sigport.org/4165
Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das and Haizhou Li, 2019. CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING. Available at: http://sigport.org/4165.
Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das and Haizhou Li. (2019). "CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING." Web.
1. Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das and Haizhou Li. CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4165

POSTER OF PAPER 3809 (SLP-P20)


Poster presented at the poster session "Speech Synthesis II" of ICASSP 2019 of the paper "ENHANCED VIRTUAL SINGERS GENERATION BY INCORPORATING SINGING DYNAMICS TO PERSONALIZED TEXT-to-SPEECH-to-SINGING"

Paper Details

Authors:
Kantapon Kaewtip, Fang-Yu Kuo, Mark Harvilla, Iris Ouyan, Pierre Lanchantin
Submitted On:
8 May 2019 - 5:21pm
Short Link:
Type:
Event:
Paper Code:
Document Year:
Cite

Document Files

POSTER_PAPER_3809.pdf

(46)

Subscribe

[1] Kantapon Kaewtip, Fang-Yu Kuo, Mark Harvilla, Iris Ouyan, Pierre Lanchantin, "POSTER OF PAPER 3809 (SLP-P20)", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4135. Accessed: Sep. 20, 2019.
@article{4135-19,
url = {http://sigport.org/4135},
author = {Kantapon Kaewtip; Fang-Yu Kuo; Mark Harvilla; Iris Ouyan; Pierre Lanchantin },
publisher = {IEEE SigPort},
title = {POSTER OF PAPER 3809 (SLP-P20)},
year = {2019} }
TY - EJOUR
T1 - POSTER OF PAPER 3809 (SLP-P20)
AU - Kantapon Kaewtip; Fang-Yu Kuo; Mark Harvilla; Iris Ouyan; Pierre Lanchantin
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4135
ER -
Kantapon Kaewtip, Fang-Yu Kuo, Mark Harvilla, Iris Ouyan, Pierre Lanchantin. (2019). POSTER OF PAPER 3809 (SLP-P20). IEEE SigPort. http://sigport.org/4135
Kantapon Kaewtip, Fang-Yu Kuo, Mark Harvilla, Iris Ouyan, Pierre Lanchantin, 2019. POSTER OF PAPER 3809 (SLP-P20). Available at: http://sigport.org/4135.
Kantapon Kaewtip, Fang-Yu Kuo, Mark Harvilla, Iris Ouyan, Pierre Lanchantin. (2019). "POSTER OF PAPER 3809 (SLP-P20)." Web.
1. Kantapon Kaewtip, Fang-Yu Kuo, Mark Harvilla, Iris Ouyan, Pierre Lanchantin. POSTER OF PAPER 3809 (SLP-P20) [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4135

Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion


Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case study of this problem. We propose a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract speech features. These features are mapped using the CycleGAN from utterances in the source style to the corresponding features of target speech.

Paper Details

Authors:
Junichi Yamagishi, Okko Räsänen, Paavo Alku
Submitted On:
8 May 2019 - 4:29am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Seshadri_ICASSP2019_final.pdf

(45)

Subscribe

[1] Junichi Yamagishi, Okko Räsänen, Paavo Alku, "Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4047. Accessed: Sep. 20, 2019.
@article{4047-19,
url = {http://sigport.org/4047},
author = {Junichi Yamagishi; Okko Räsänen; Paavo Alku },
publisher = {IEEE SigPort},
title = {Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion},
year = {2019} }
TY - EJOUR
T1 - Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion
AU - Junichi Yamagishi; Okko Räsänen; Paavo Alku
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4047
ER -
Junichi Yamagishi, Okko Räsänen, Paavo Alku. (2019). Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion. IEEE SigPort. http://sigport.org/4047
Junichi Yamagishi, Okko Räsänen, Paavo Alku, 2019. Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion. Available at: http://sigport.org/4047.
Junichi Yamagishi, Okko Räsänen, Paavo Alku. (2019). "Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion." Web.
1. Junichi Yamagishi, Okko Räsänen, Paavo Alku. Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4047

DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS


Deep neural networks (DNNs) have been successfully deployed for acoustic modelling in statistical parametric speech synthesis (SPSS) systems. Moreover, DNN-based postfilters (PF) have also been shown to outperform conventional postfilters that are widely used in SPSS systems for increasing the quality of synthesized speech. However, existing DNN-based postfilters are trained with speaker-dependent databases. Given that SPSS systems can rapidly adapt to new speakers from generic models, there is a need for DNN-based postfilters that can adapt to new speakers with minimal adaptation data.

Paper Details

Authors:
Miraç Göksu Öztürk,Okan Ulusoy,Cenk Demiroglu
Submitted On:
15 February 2019 - 5:17am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp19_poster.pdf

(133)

Subscribe

[1] Miraç Göksu Öztürk,Okan Ulusoy,Cenk Demiroglu, "DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3851. Accessed: Sep. 20, 2019.
@article{3851-19,
url = {http://sigport.org/3851},
author = {Miraç Göksu Öztürk;Okan Ulusoy;Cenk Demiroglu },
publisher = {IEEE SigPort},
title = {DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS},
year = {2019} }
TY - EJOUR
T1 - DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS
AU - Miraç Göksu Öztürk;Okan Ulusoy;Cenk Demiroglu
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3851
ER -
Miraç Göksu Öztürk,Okan Ulusoy,Cenk Demiroglu. (2019). DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS. IEEE SigPort. http://sigport.org/3851
Miraç Göksu Öztürk,Okan Ulusoy,Cenk Demiroglu, 2019. DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS. Available at: http://sigport.org/3851.
Miraç Göksu Öztürk,Okan Ulusoy,Cenk Demiroglu. (2019). "DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS." Web.
1. Miraç Göksu Öztürk,Okan Ulusoy,Cenk Demiroglu. DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3851

HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK

Paper Details

Authors:
Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba
Submitted On:
17 June 2018 - 4:42am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

poster.pdf

(240)

Subscribe

[1] Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba, "HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3233. Accessed: Sep. 20, 2019.
@article{3233-18,
url = {http://sigport.org/3233},
author = {Fuming Fang; Junichi Yamagishi; Isao Echizen; Jaime Lorenzo-Trueba },
publisher = {IEEE SigPort},
title = {HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK},
year = {2018} }
TY - EJOUR
T1 - HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK
AU - Fuming Fang; Junichi Yamagishi; Isao Echizen; Jaime Lorenzo-Trueba
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3233
ER -
Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba. (2018). HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK. IEEE SigPort. http://sigport.org/3233
Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba, 2018. HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK. Available at: http://sigport.org/3233.
Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba. (2018). "HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK." Web.
1. Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba. HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3233

Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody


We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm "cyborg speech" as it combines human and machine speech parameters.

Paper Details

Authors:
Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi
Submitted On:
29 April 2018 - 1:59pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Cyborg Speech presentation slides

(191)

Subscribe

[1] Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi, "Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3187. Accessed: Sep. 20, 2019.
@article{3187-18,
url = {http://sigport.org/3187},
author = {Jaime Lorenzo-Trueba; Mariko Kondo; Junichi Yamagishi },
publisher = {IEEE SigPort},
title = {Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody},
year = {2018} }
TY - EJOUR
T1 - Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody
AU - Jaime Lorenzo-Trueba; Mariko Kondo; Junichi Yamagishi
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3187
ER -
Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi. (2018). Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody. IEEE SigPort. http://sigport.org/3187
Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi, 2018. Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody. Available at: http://sigport.org/3187.
Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi. (2018). "Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody." Web.
1. Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi. Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3187

Pages