Documents
Poster
An End-to-End Network to Synthesize Intonation using a Generalized Command Response Model - Poster
- Citation Author(s):
- Submitted by:
- Francois Marelli
- Last updated:
- 10 May 2019 - 11:54am
- Document Type:
- Poster
- Document Year:
- 2019
- Event:
- Presenters:
- Marelli Francois, Bastian Schnell, Philip N. Garner
- Paper Code:
- 3481
- Categories:
- Log in to post comments
The generalized command response (GCR) model represents intonation as a
superposition of muscle responses to spike command signals. We have previously
shown that the spikes can be predicted by a two-stage system, consisting of a recurrent neural network and a post-processing procedure, but the responses themselves were fixed dictionary atoms. We propose an end-to-end
neural architecture that replaces the dictionary atoms with trainable
second-order recurrent elements analogous to recursive filters. We demonstrate
gradient stability under modest conditions, and show that the system can be
trained by imposing temporal sparsity constraints. Subjective listening tests
demonstrate that the system can synthesize intonation with high naturalness,
comparable to state-of-the-art acoustic models, and retains the physiological
plausibility of the GCR model.