Sorry, you need to enable JavaScript to visit this website.

Speech Synthesis and Generation, including TTS (SPE-SYNT)

QUALITY ASSESSMENT OF VOICE CONVERTED SPEECH USING ARTICULATORY FEATURES


We propose a novel application of the acoustic- to- articulatory inversion (AAI) towards a quality assessment of the voice converted speech. The ability of humans to speak effortlessly requires the coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards a naturalness, intelligibility and speaker’s identity (which is partially present in voice converted speech). Hence, during voice conversion (VC), the information related to the speech production is lost.

Paper Details

Authors:
Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki and Hemant A. Patil
Submitted On:
28 February 2017 - 5:15am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

quality_poster.pdf

(39 downloads)

Keywords

Subscribe

[1] Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki and Hemant A. Patil, "QUALITY ASSESSMENT OF VOICE CONVERTED SPEECH USING ARTICULATORY FEATURES ", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1496. Accessed: Jun. 22, 2017.
@article{1496-17,
url = {http://sigport.org/1496},
author = {Avni Rajpal; Nirmesh J. Shah; Mohammadi Zaki and Hemant A. Patil },
publisher = {IEEE SigPort},
title = {QUALITY ASSESSMENT OF VOICE CONVERTED SPEECH USING ARTICULATORY FEATURES },
year = {2017} }
TY - EJOUR
T1 - QUALITY ASSESSMENT OF VOICE CONVERTED SPEECH USING ARTICULATORY FEATURES
AU - Avni Rajpal; Nirmesh J. Shah; Mohammadi Zaki and Hemant A. Patil
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1496
ER -
Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki and Hemant A. Patil. (2017). QUALITY ASSESSMENT OF VOICE CONVERTED SPEECH USING ARTICULATORY FEATURES . IEEE SigPort. http://sigport.org/1496
Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki and Hemant A. Patil, 2017. QUALITY ASSESSMENT OF VOICE CONVERTED SPEECH USING ARTICULATORY FEATURES . Available at: http://sigport.org/1496.
Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki and Hemant A. Patil. (2017). "QUALITY ASSESSMENT OF VOICE CONVERTED SPEECH USING ARTICULATORY FEATURES ." Web.
1. Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki and Hemant A. Patil. QUALITY ASSESSMENT OF VOICE CONVERTED SPEECH USING ARTICULATORY FEATURES [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1496

NOVEL AMPLITUDE SCALING METHOD FOR BILINEAR FREQUENCY WARPING-BASED VOICE CONVERSION


In frequency warping (FW)-based Voice Conversion (VC), the source spectrum is modified to match the frequency-axis of the target spectrum followed by an Amplitude Scaling (AS) to compensate the amplitude differences between the warped spectrum and the actual target spectrum. In this paper, we propose a novel AS technique which linearly transfers the amplitude of the frequency warped spectrum using the knowledge of a Gaussian Mixture Model (GMM)-based converted spectrum without adding any spurious peaks.

Paper Details

Authors:
Nirmesh J. Shah and Hemant A. Patil
Submitted On:
28 February 2017 - 4:49am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP_2017_NH.pdf

(39 downloads)

Keywords

Subscribe

[1] Nirmesh J. Shah and Hemant A. Patil, "NOVEL AMPLITUDE SCALING METHOD FOR BILINEAR FREQUENCY WARPING-BASED VOICE CONVERSION", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1493. Accessed: Jun. 22, 2017.
@article{1493-17,
url = {http://sigport.org/1493},
author = {Nirmesh J. Shah and Hemant A. Patil },
publisher = {IEEE SigPort},
title = {NOVEL AMPLITUDE SCALING METHOD FOR BILINEAR FREQUENCY WARPING-BASED VOICE CONVERSION},
year = {2017} }
TY - EJOUR
T1 - NOVEL AMPLITUDE SCALING METHOD FOR BILINEAR FREQUENCY WARPING-BASED VOICE CONVERSION
AU - Nirmesh J. Shah and Hemant A. Patil
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1493
ER -
Nirmesh J. Shah and Hemant A. Patil. (2017). NOVEL AMPLITUDE SCALING METHOD FOR BILINEAR FREQUENCY WARPING-BASED VOICE CONVERSION. IEEE SigPort. http://sigport.org/1493
Nirmesh J. Shah and Hemant A. Patil, 2017. NOVEL AMPLITUDE SCALING METHOD FOR BILINEAR FREQUENCY WARPING-BASED VOICE CONVERSION. Available at: http://sigport.org/1493.
Nirmesh J. Shah and Hemant A. Patil. (2017). "NOVEL AMPLITUDE SCALING METHOD FOR BILINEAR FREQUENCY WARPING-BASED VOICE CONVERSION." Web.
1. Nirmesh J. Shah and Hemant A. Patil. NOVEL AMPLITUDE SCALING METHOD FOR BILINEAR FREQUENCY WARPING-BASED VOICE CONVERSION [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1493

VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO


Speechreading is a notoriously difficult task for humans to perform. In this paper we present an end-to-end model based on a convolutional neural network (CNN) for generating an intelligible acoustic speech signal from silent video frames of a speaking person. The proposed CNN generates sound features for each frame based on its neighboring frames. Waveforms are then synthesized from the learned speech features to produce intelligible speech.

Paper Details

Authors:
Ariel Ephrat, Shmuel Peleg
Submitted On:
27 February 2017 - 3:05pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

vid2speech_poster

(50 downloads)

Keywords

Additional Categories

Subscribe

[1] Ariel Ephrat, Shmuel Peleg, "VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1448. Accessed: Jun. 22, 2017.
@article{1448-17,
url = {http://sigport.org/1448},
author = {Ariel Ephrat; Shmuel Peleg },
publisher = {IEEE SigPort},
title = {VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO},
year = {2017} }
TY - EJOUR
T1 - VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO
AU - Ariel Ephrat; Shmuel Peleg
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1448
ER -
Ariel Ephrat, Shmuel Peleg. (2017). VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO. IEEE SigPort. http://sigport.org/1448
Ariel Ephrat, Shmuel Peleg, 2017. VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO. Available at: http://sigport.org/1448.
Ariel Ephrat, Shmuel Peleg. (2017). "VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO." Web.
1. Ariel Ephrat, Shmuel Peleg. VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1448

Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation


This paper proposes two types of machine-extracted linguistic features from unlimited text input for Mandarin prosody generation. One is the improved punctuation confidence (iPC) which is a modified version of the previously proposed punctuation confidence that represents likelihood of inserting major punctuation marks (PMs) at word boundaries. Another is the quotation confidence (QC) which measures likelihood of a word string to be quoted as a meaningful or emphasized unit.

Paper Details

Authors:
Submitted On:
15 October 2016 - 4:07am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ISCSLP-PCQC.pdf

(65 downloads)

Keywords

Subscribe

[1] , "Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1221. Accessed: Jun. 22, 2017.
@article{1221-16,
url = {http://sigport.org/1221},
author = { },
publisher = {IEEE SigPort},
title = {Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation},
year = {2016} }
TY - EJOUR
T1 - Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation
AU -
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1221
ER -
. (2016). Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation. IEEE SigPort. http://sigport.org/1221
, 2016. Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation. Available at: http://sigport.org/1221.
. (2016). "Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation." Web.
1. . Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1221

DNN-Based Unit Selection Using Frame-Sized Speech Segments


This paper presents a deep neural network (DNN)-based unit selection method for waveform concatenation speech synthesis using frame-sized speech segments. In this method, three DNNs are adopted to calculate target costs and concatenation costs respectively for selecting frame-sized candidate units. The first DNN is built in the same way as the DNN-based statistical parametric speech synthesis, which predicts target acoustic features given linguistic context inputs.

Paper Details

Authors:
Zhen-Hua Ling
Submitted On:
14 October 2016 - 9:24am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ISCLSP2016_zpzhou_presentation.pdf

(87 downloads)

Keywords

Subscribe

[1] Zhen-Hua Ling, "DNN-Based Unit Selection Using Frame-Sized Speech Segments", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1197. Accessed: Jun. 22, 2017.
@article{1197-16,
url = {http://sigport.org/1197},
author = {Zhen-Hua Ling },
publisher = {IEEE SigPort},
title = {DNN-Based Unit Selection Using Frame-Sized Speech Segments},
year = {2016} }
TY - EJOUR
T1 - DNN-Based Unit Selection Using Frame-Sized Speech Segments
AU - Zhen-Hua Ling
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1197
ER -
Zhen-Hua Ling. (2016). DNN-Based Unit Selection Using Frame-Sized Speech Segments. IEEE SigPort. http://sigport.org/1197
Zhen-Hua Ling, 2016. DNN-Based Unit Selection Using Frame-Sized Speech Segments. Available at: http://sigport.org/1197.
Zhen-Hua Ling. (2016). "DNN-Based Unit Selection Using Frame-Sized Speech Segments." Web.
1. Zhen-Hua Ling. DNN-Based Unit Selection Using Frame-Sized Speech Segments [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1197

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network


In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task. Voice conversion has been widely studied due to its potential applications such as personalized speech synthesis and speech enhancement. Exemplar-based NMF (ENMF) emerges as an effective and probably the simplest choice among all techniques for SC, as long as a source-target parallel speech corpus is given. ENMF-based SC systems usually need a large amount of bases (exemplars) to ensure the quality of the converted speech.

Paper Details

Authors:
Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang
Submitted On:
13 October 2016 - 4:15am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

2016-10-20-ISCSLP-v1.0-SigPort.pptx

(60 downloads)

Keywords

Subscribe

[1] Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang, "Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1168. Accessed: Jun. 22, 2017.
@article{1168-16,
url = {http://sigport.org/1168},
author = {Chin-Cheng Hsu; Hsin-Te Hwang; Yi-Chiao Wu; Yu Tsao; and Hsin-Min Wang },
publisher = {IEEE SigPort},
title = {Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network},
year = {2016} }
TY - EJOUR
T1 - Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network
AU - Chin-Cheng Hsu; Hsin-Te Hwang; Yi-Chiao Wu; Yu Tsao; and Hsin-Min Wang
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1168
ER -
Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang. (2016). Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network. IEEE SigPort. http://sigport.org/1168
Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang, 2016. Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network. Available at: http://sigport.org/1168.
Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang. (2016). "Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network." Web.
1. Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang. Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1168

Tongue Shape Variation Model for Simulating Mandarin Chinese Articulation


We studied tongue shapes extracted from X-ray films which were taken during the process of mandarin Chinese articulation. Through factor analysis, we built an eight-parameter-driven tongue articulation model. This study reveals that the front of the tongue has large horizontal movement; the blade of the tongue has large vertical movement; whereas the back, as well as the root, of the tongue has small movement both horizontally and vertically. This model can be used to drive a 3D tongue model to control its articulatory behavior.

Paper Details

Authors:
Jinguang Zhang, Xiyu Wu, Jiangping Kong
Submitted On:
12 October 2016 - 10:59am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Tongue Shape Variation Model.pdf

(67 downloads)

Keywords

Subscribe

[1] Jinguang Zhang, Xiyu Wu, Jiangping Kong, "Tongue Shape Variation Model for Simulating Mandarin Chinese Articulation", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1163. Accessed: Jun. 22, 2017.
@article{1163-16,
url = {http://sigport.org/1163},
author = {Jinguang Zhang; Xiyu Wu; Jiangping Kong },
publisher = {IEEE SigPort},
title = {Tongue Shape Variation Model for Simulating Mandarin Chinese Articulation},
year = {2016} }
TY - EJOUR
T1 - Tongue Shape Variation Model for Simulating Mandarin Chinese Articulation
AU - Jinguang Zhang; Xiyu Wu; Jiangping Kong
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1163
ER -
Jinguang Zhang, Xiyu Wu, Jiangping Kong. (2016). Tongue Shape Variation Model for Simulating Mandarin Chinese Articulation. IEEE SigPort. http://sigport.org/1163
Jinguang Zhang, Xiyu Wu, Jiangping Kong, 2016. Tongue Shape Variation Model for Simulating Mandarin Chinese Articulation. Available at: http://sigport.org/1163.
Jinguang Zhang, Xiyu Wu, Jiangping Kong. (2016). "Tongue Shape Variation Model for Simulating Mandarin Chinese Articulation." Web.
1. Jinguang Zhang, Xiyu Wu, Jiangping Kong. Tongue Shape Variation Model for Simulating Mandarin Chinese Articulation [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1163

A SPEAKER ADAPTATION TECHNIQUE FOR GAUSSIAN PROCESS REGRESSION BASED SPEECH SYNTHESIS USING FEATURE SPACE TRANSFORM

Paper Details

Authors:
Tomoki Koriyama, Syohei Oshio, Takao Kobayashi
Submitted On:
20 March 2016 - 8:10pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

1603ICASSP4.pdf

(140 downloads)

Keywords

Subscribe

[1] Tomoki Koriyama, Syohei Oshio, Takao Kobayashi, "A SPEAKER ADAPTATION TECHNIQUE FOR GAUSSIAN PROCESS REGRESSION BASED SPEECH SYNTHESIS USING FEATURE SPACE TRANSFORM", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/894. Accessed: Jun. 22, 2017.
@article{894-16,
url = {http://sigport.org/894},
author = {Tomoki Koriyama; Syohei Oshio; Takao Kobayashi },
publisher = {IEEE SigPort},
title = {A SPEAKER ADAPTATION TECHNIQUE FOR GAUSSIAN PROCESS REGRESSION BASED SPEECH SYNTHESIS USING FEATURE SPACE TRANSFORM},
year = {2016} }
TY - EJOUR
T1 - A SPEAKER ADAPTATION TECHNIQUE FOR GAUSSIAN PROCESS REGRESSION BASED SPEECH SYNTHESIS USING FEATURE SPACE TRANSFORM
AU - Tomoki Koriyama; Syohei Oshio; Takao Kobayashi
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/894
ER -
Tomoki Koriyama, Syohei Oshio, Takao Kobayashi. (2016). A SPEAKER ADAPTATION TECHNIQUE FOR GAUSSIAN PROCESS REGRESSION BASED SPEECH SYNTHESIS USING FEATURE SPACE TRANSFORM. IEEE SigPort. http://sigport.org/894
Tomoki Koriyama, Syohei Oshio, Takao Kobayashi, 2016. A SPEAKER ADAPTATION TECHNIQUE FOR GAUSSIAN PROCESS REGRESSION BASED SPEECH SYNTHESIS USING FEATURE SPACE TRANSFORM. Available at: http://sigport.org/894.
Tomoki Koriyama, Syohei Oshio, Takao Kobayashi. (2016). "A SPEAKER ADAPTATION TECHNIQUE FOR GAUSSIAN PROCESS REGRESSION BASED SPEECH SYNTHESIS USING FEATURE SPACE TRANSFORM." Web.
1. Tomoki Koriyama, Syohei Oshio, Takao Kobayashi. A SPEAKER ADAPTATION TECHNIQUE FOR GAUSSIAN PROCESS REGRESSION BASED SPEECH SYNTHESIS USING FEATURE SPACE TRANSFORM [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/894