Sorry, you need to enable JavaScript to visit this website.

Speech Synthesis and Generation, including TTS (SPE-SYNT)

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks


The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed technique, long-term dependencies of singing voices are modeled by CNNs.

Paper Details

Authors:
Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
Submitted On:
20 May 2020 - 8:26pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2020_slide_20200417b.pdf

(9)

Subscribe

[1] Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda, "Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5421. Accessed: Jun. 07, 2020.
@article{5421-20,
url = {http://sigport.org/5421},
author = {Kazuhiro Nakamura; Shinji Takaki; Kei Hashimoto; Keiichiro Oura; Yoshihiko Nankaku; Keiichi Tokuda },
publisher = {IEEE SigPort},
title = {Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks},
year = {2020} }
TY - EJOUR
T1 - Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks
AU - Kazuhiro Nakamura; Shinji Takaki; Kei Hashimoto; Keiichiro Oura; Yoshihiko Nankaku; Keiichi Tokuda
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5421
ER -
Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda. (2020). Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks. IEEE SigPort. http://sigport.org/5421
Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda, 2020. Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks. Available at: http://sigport.org/5421.
Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda. (2020). "Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks." Web.
1. Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda. Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5421

EPOCH EXTRACTION FROM A SPEECH SIGNAL USING GAMMATONE WAVELETS IN A SCATTERING NETWORK


In speech production, epochs are glottal closure instants where significant energy is released from the lungs. Extracting an epoch accurately is important in speech synthesis, analysis, and pitch oriented studies. The time-varying characteristics of the source and the system, and channel attenuation of low-frequency components by telephone channels make estimation of epoch from a speech signal a challenging task.

Paper Details

Authors:
Pavan Kulkarni, Jishnu Sadasivan, Aniruddha Adiga, Chandra Sekhar Seelamantula
Submitted On:
16 May 2020 - 2:22pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Epoch Extraction using gammatone wavelets

(9)

Subscribe

[1] Pavan Kulkarni, Jishnu Sadasivan, Aniruddha Adiga, Chandra Sekhar Seelamantula, "EPOCH EXTRACTION FROM A SPEECH SIGNAL USING GAMMATONE WAVELETS IN A SCATTERING NETWORK", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5378. Accessed: Jun. 07, 2020.
@article{5378-20,
url = {http://sigport.org/5378},
author = {Pavan Kulkarni; Jishnu Sadasivan; Aniruddha Adiga; Chandra Sekhar Seelamantula },
publisher = {IEEE SigPort},
title = {EPOCH EXTRACTION FROM A SPEECH SIGNAL USING GAMMATONE WAVELETS IN A SCATTERING NETWORK},
year = {2020} }
TY - EJOUR
T1 - EPOCH EXTRACTION FROM A SPEECH SIGNAL USING GAMMATONE WAVELETS IN A SCATTERING NETWORK
AU - Pavan Kulkarni; Jishnu Sadasivan; Aniruddha Adiga; Chandra Sekhar Seelamantula
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5378
ER -
Pavan Kulkarni, Jishnu Sadasivan, Aniruddha Adiga, Chandra Sekhar Seelamantula. (2020). EPOCH EXTRACTION FROM A SPEECH SIGNAL USING GAMMATONE WAVELETS IN A SCATTERING NETWORK. IEEE SigPort. http://sigport.org/5378
Pavan Kulkarni, Jishnu Sadasivan, Aniruddha Adiga, Chandra Sekhar Seelamantula, 2020. EPOCH EXTRACTION FROM A SPEECH SIGNAL USING GAMMATONE WAVELETS IN A SCATTERING NETWORK. Available at: http://sigport.org/5378.
Pavan Kulkarni, Jishnu Sadasivan, Aniruddha Adiga, Chandra Sekhar Seelamantula. (2020). "EPOCH EXTRACTION FROM A SPEECH SIGNAL USING GAMMATONE WAVELETS IN A SCATTERING NETWORK." Web.
1. Pavan Kulkarni, Jishnu Sadasivan, Aniruddha Adiga, Chandra Sekhar Seelamantula. EPOCH EXTRACTION FROM A SPEECH SIGNAL USING GAMMATONE WAVELETS IN A SCATTERING NETWORK [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5378

ICASSP 2020 Presentation Poster Slides


ONE-SHOT VOICE CONVERSION USING STAR-GAN

Paper Details

Authors:
Submitted On:
15 May 2020 - 9:05am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:

Document Files

A0_poster_new.pptx

(8)

Subscribe

[1] , "ICASSP 2020 Presentation Poster Slides", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5347. Accessed: Jun. 07, 2020.
@article{5347-20,
url = {http://sigport.org/5347},
author = { },
publisher = {IEEE SigPort},
title = {ICASSP 2020 Presentation Poster Slides},
year = {2020} }
TY - EJOUR
T1 - ICASSP 2020 Presentation Poster Slides
AU -
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5347
ER -
. (2020). ICASSP 2020 Presentation Poster Slides. IEEE SigPort. http://sigport.org/5347
, 2020. ICASSP 2020 Presentation Poster Slides. Available at: http://sigport.org/5347.
. (2020). "ICASSP 2020 Presentation Poster Slides." Web.
1. . ICASSP 2020 Presentation Poster Slides [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5347

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis


Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text. We show that these failures can be addressed using simple location-relative attention mechanisms that do away with content-based query/key comparisons. We compare two families of attention mechanisms: location-relative GMM-based mechanisms and additive energy-based mechanisms.

Paper Details

Authors:
Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby
Submitted On:
14 May 2020 - 6:30pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Location-Relative Attention (slides).pdf

(9)

Subscribe

[1] Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby, "Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5324. Accessed: Jun. 07, 2020.
@article{5324-20,
url = {http://sigport.org/5324},
author = {Eric Battenberg; RJ Skerry-Ryan; Soroosh Mariooryad; Daisy Stanton; David Kao; Matt Shannon; Tom Bagby },
publisher = {IEEE SigPort},
title = {Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis},
year = {2020} }
TY - EJOUR
T1 - Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
AU - Eric Battenberg; RJ Skerry-Ryan; Soroosh Mariooryad; Daisy Stanton; David Kao; Matt Shannon; Tom Bagby
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5324
ER -
Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby. (2020). Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis. IEEE SigPort. http://sigport.org/5324
Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby, 2020. Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis. Available at: http://sigport.org/5324.
Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby. (2020). "Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis." Web.
1. Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby. Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5324

Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network


In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).
The recently proposed LPCNet vocoder has successfully achieved high-quality and lightweight speech synthesis systems by combining a vocal tract LP filter with a WaveRNN-based vocal source (i.e., excitation) generator.

Paper Details

Authors:
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, and Hong-Goo Kang
Submitted On:
14 May 2020 - 2:40am
Short Link:
Type:
Event:

Document Files

20200507_minjae.pdf

(12)

Subscribe

[1] Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, and Hong-Goo Kang, "Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5237. Accessed: Jun. 07, 2020.
@article{5237-20,
url = {http://sigport.org/5237},
author = {Min-Jae Hwang; Eunwoo Song; Ryuichi Yamamoto; Frank Soong; and Hong-Goo Kang },
publisher = {IEEE SigPort},
title = {Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network},
year = {2020} }
TY - EJOUR
T1 - Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network
AU - Min-Jae Hwang; Eunwoo Song; Ryuichi Yamamoto; Frank Soong; and Hong-Goo Kang
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5237
ER -
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, and Hong-Goo Kang. (2020). Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network. IEEE SigPort. http://sigport.org/5237
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, and Hong-Goo Kang, 2020. Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network. Available at: http://sigport.org/5237.
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, and Hong-Goo Kang. (2020). "Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network." Web.
1. Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, and Hong-Goo Kang. Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5237

'EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH


Voice conversion (VC) is a task that alters the voice of a person to suit different styles while conserving the linguistic content. Previous state-of-the-art technology used in VC was based on the sequence-to-sequence (seq2seq) model, which could lose linguistic information. There was an attempt to overcome this problem using textual supervision; however, this required explicit alignment, and therefore the benefit of using seq2seq model was lost. In this study, a voice converter that utilizes multitask learning with text-to-speech (TTS) is presented.

Paper Details

Authors:
Tae-Ho Kim, Sungjae Cho, Shinkook Choi, Sejik Park, Soo-Young Lee
Submitted On:
14 May 2020 - 1:47am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Slides

(10)

Subscribe

[1] Tae-Ho Kim, Sungjae Cho, Shinkook Choi, Sejik Park, Soo-Young Lee, "'EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5227. Accessed: Jun. 07, 2020.
@article{5227-20,
url = {http://sigport.org/5227},
author = {Tae-Ho Kim; Sungjae Cho; Shinkook Choi; Sejik Park; Soo-Young Lee },
publisher = {IEEE SigPort},
title = {'EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH},
year = {2020} }
TY - EJOUR
T1 - 'EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH
AU - Tae-Ho Kim; Sungjae Cho; Shinkook Choi; Sejik Park; Soo-Young Lee
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5227
ER -
Tae-Ho Kim, Sungjae Cho, Shinkook Choi, Sejik Park, Soo-Young Lee. (2020). 'EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH. IEEE SigPort. http://sigport.org/5227
Tae-Ho Kim, Sungjae Cho, Shinkook Choi, Sejik Park, Soo-Young Lee, 2020. 'EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH. Available at: http://sigport.org/5227.
Tae-Ho Kim, Sungjae Cho, Shinkook Choi, Sejik Park, Soo-Young Lee. (2020). "'EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH." Web.
1. Tae-Ho Kim, Sungjae Cho, Shinkook Choi, Sejik Park, Soo-Young Lee. 'EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5227

PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM


We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained.

Paper Details

Authors:
Submitted On:
13 May 2020 - 10:56pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Final presentation slides

(8)

Subscribe

[1] , "PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5208. Accessed: Jun. 07, 2020.
@article{5208-20,
url = {http://sigport.org/5208},
author = { },
publisher = {IEEE SigPort},
title = {PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM},
year = {2020} }
TY - EJOUR
T1 - PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM
AU -
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5208
ER -
. (2020). PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM. IEEE SigPort. http://sigport.org/5208
, 2020. PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM. Available at: http://sigport.org/5208.
. (2020). "PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM." Web.
1. . PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5208

IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS


Recent advances of neural TTS have made “human parity” synthesized speech possible when a large amount of studio-quality training data from a voice talent is available. However, with only limited, casual recordings from an ordinary speaker, human-like TTS is still a big challenge, in addition to other artifacts like incomplete sentences, repetition of words, etc.

Paper Details

Authors:
Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong
Submitted On:
13 May 2020 - 10:39pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Slides_icassp2020_upload.pptx

(12)

Subscribe

[1] Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong, "IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5202. Accessed: Jun. 07, 2020.
@article{5202-20,
url = {http://sigport.org/5202},
author = {Yujia Xiao; Lei He; Huaiping Ming; Frank K. Soong },
publisher = {IEEE SigPort},
title = {IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS},
year = {2020} }
TY - EJOUR
T1 - IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS
AU - Yujia Xiao; Lei He; Huaiping Ming; Frank K. Soong
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5202
ER -
Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong. (2020). IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS. IEEE SigPort. http://sigport.org/5202
Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong, 2020. IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS. Available at: http://sigport.org/5202.
Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong. (2020). "IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS." Web.
1. Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong. IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5202

A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN


In this paper, we propose a hybrid text normalization system using multi-head self-attention. The system combines the advantages of a rule-based model and a neural model for text preprocessing tasks. Previous studies in Mandarin text normalization usually use a set of hand-written rules, which are hard to improve on general cases. The idea of our proposed system is motivated by the neural models from recent studies and has a better performance on our internal news corpus. This paper also includes different attempts to deal with imbalanced pattern distribution of the dataset.

Paper Details

Authors:
Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma
Submitted On:
13 May 2020 - 10:27pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

slides for the presentation

(8)

Subscribe

[1] Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma, "A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5199. Accessed: Jun. 07, 2020.
@article{5199-20,
url = {http://sigport.org/5199},
author = {Junhui Zhang; Junjie Pan; Xiang Yin; Chen Li; Shichao Liu; Yang Zhang; Yuxuan Wang; Zejun Ma },
publisher = {IEEE SigPort},
title = {A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN},
year = {2020} }
TY - EJOUR
T1 - A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN
AU - Junhui Zhang; Junjie Pan; Xiang Yin; Chen Li; Shichao Liu; Yang Zhang; Yuxuan Wang; Zejun Ma
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5199
ER -
Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma. (2020). A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN. IEEE SigPort. http://sigport.org/5199
Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma, 2020. A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN. Available at: http://sigport.org/5199.
Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma. (2020). "A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN." Web.
1. Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma. A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5199

A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS

Paper Details

Authors:
Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang
Submitted On:
13 May 2020 - 10:24pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Unified_FrontEnd.pptx

(10)

Subscribe

[1] Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang, "A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5197. Accessed: Jun. 07, 2020.
@article{5197-20,
url = {http://sigport.org/5197},
author = {Junjie Pan; Xiang Yin; Zhiling Zhang; Shichao Liu; Yang Zhang; Zejun Ma; Yuxuan Wang },
publisher = {IEEE SigPort},
title = {A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS},
year = {2020} }
TY - EJOUR
T1 - A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS
AU - Junjie Pan; Xiang Yin; Zhiling Zhang; Shichao Liu; Yang Zhang; Zejun Ma; Yuxuan Wang
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5197
ER -
Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang. (2020). A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS. IEEE SigPort. http://sigport.org/5197
Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang, 2020. A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS. Available at: http://sigport.org/5197.
Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang. (2020). "A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS." Web.
1. Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang. A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5197

Pages