Study on the Relation of Fundamental and Formant Frequencies for Affective Speech Synthesis

Bogu Li, Zhilei Liu, Jianwu Dang
Li Bogu
11 October 2016 - 12:11am
Bogu Li
Directions into Velocities of Articulators (DIVA) model is a kind of self-adaptive neural network model which controls movements of a simulated vocal tract to produce words, syllables or phonemes. However, DIVA model lacks of emotion functions. To implement the emotion function in DIVA model, we investigate the process of affective speech production based on the combination of fundamental frequency (F0) and formant frequencies, as well as the relations between F0 and formants of emotional speech. The relations between F0 and formants of the speech with different emotions are investigated using the logistic regression (LR) models on the emotional databases. For a given emotion-related F0, the formants can be predicted correctly using the LR models. An affective speech synthesizer was constructed by implementing the relation of F0 and formants in an improved formant synthesis method. Experiments on affective speech synthesis were conducted on three different emotional speech datasets, and affective speech with negative or positive emotion can also be effectively synthesized from neutral speech.

