Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody

Abstract: 

We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm "cyborg speech" as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quinphone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.

up
0 users have voted:

Paper Details

Authors:
Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi
Submitted On:
29 April 2018 - 1:59pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Gustav Eje Henter
Paper Code:
SP-L2.5
Document Year:
2018
Cite

Document Files

Cyborg Speech presentation slides

(63 downloads)

Subscribe

[1] Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi, "Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3187. Accessed: Jul. 23, 2018.
@article{3187-18,
url = {http://sigport.org/3187},
author = {Jaime Lorenzo-Trueba; Mariko Kondo; Junichi Yamagishi },
publisher = {IEEE SigPort},
title = {Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody},
year = {2018} }
TY - EJOUR
T1 - Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody
AU - Jaime Lorenzo-Trueba; Mariko Kondo; Junichi Yamagishi
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3187
ER -
Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi. (2018). Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody. IEEE SigPort. http://sigport.org/3187
Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi, 2018. Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody. Available at: http://sigport.org/3187.
Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi. (2018). "Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody." Web.
1. Jaime Lorenzo-Trueba, Mariko Kondo, Junichi Yamagishi. Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3187