Documents
Presentation Slides
Differential DSP Vocoder - ICASSP 2024
- DOI:
- 10.60864/a0ff-ke67
- Citation Author(s):
- Submitted by:
- Prabhav Agrawal
- Last updated:
- 6 June 2024 - 10:23am
- Document Type:
- Presentation Slides
- Document Year:
- 2024
- Presenters:
- Thilo Koehler, Prabhav Agrawal
- Paper Code:
- SLP-L18.1
- Categories:
- Log in to post comments
Neural vocoders model the raw audio waveform and synthesize highquality audio, but even the highly efficient ones, like MB-MelGAN
and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can
be implemented via lightweight fast Fourier transforms (FFT), and
therefore, is a magnitude faster than any neural vocoder. A DSP
vocoder often gets a lower audio quality due to consuming oversmoothed acoustic model predictions of approximate representations
for the vocal tract. In this paper, we propose an ultra-lightweight differential DSP (DDSP) vocoder that uses a jointly optimized acoustic
model with a DSP vocoder, and learns without an extracted spectral feature for the vocal tract. The model achieves audio quality
comparable to neural vocoders with a high average MOS of 4.36
while being efficient as a DSP vocoder. Our C++ implementation,
without any hardware-specific optimization, is at 15 MFLOPS, surpasses MB-MelGAN by 340 times in terms of FLOPS, and achieves
a vocoder-only RTF of 0.003 and overall RTF of 0.044 while running
single-threaded on a 2GHz Intel Xeon CPU.