Sorry, you need to enable JavaScript to visit this website.

Feature Based Adaptation For Speaking Style Synthesis

Citation Author(s):
Lifa Sun, Shiyin Kang, Songxiang Liu, Zhiyong Wu, Xunying Liu, Helen Meng
Submitted by:
Xixin Wu
Last updated:
12 April 2018 - 10:14pm
Document Type:
Poster
Document Year:
2018
Event:
Presenters:
Xixin Wu
Paper Code:
SP-P6
 

Speaking style plays an important role in the expressivity of speech for communication. Hence speaking style is very important for synthetic speech as well. Speaking style adaptation faces the difficulty that the data of specific styles may be limited and difficult to obtain in large amounts. A possible solution is to leverage data from speaking styles that are more available, to train the speech synthesizer and then adapt it to the target style for which the data is scarce. Conventional DNN adaptation approaches directly update the top layers of a well-trained, style-dependent model towards the target style. The detailed local context-level mismatch between the original and the target styles is not considered. In order to address this issue, two frame-level input feature-based style adaptation techniques are investigated in this paper. We will use style features extracted from (1) a target-style data trained bottleneck DNN, and (2) a novel cross-style residual feature regression DNN. These features are used for top-layer adaptation of a well-trained style-dependent synthesis network. Experimental results on adapting the declarative style to the interrogative style demonstrate the effectiveness of our proposed style features in improving the expressiveness of synthesizing speech for the interrogative style, while maintaining speech quality.

up
0 users have voted: