High-quality speech coding with SampleRNN
Janusz Klejsa, Per Hedelin, Cong Zhou, Roy Fejgin and Lars Villemoes

SampleRNN at 8.0 kbps

ReferenceVocoder 8.0 kbpsAMR-WB 23.05 kbpsSILK 16SampleRNN 8.0 kbps
002o0o01
00ac031o
00ac0529
00ba0110
00bc0515
442p070j
442r080v
443a010d
446c0212
446r0809

SampleRNN at 6.4 kbps

ReferenceVocoder 6.4 kbpsAMR-WB 23.05 kbpsSILK 16SampleRNN 6.4 kbps
002o0o01
00ac031o
00ac0529
00ba0110
00bc0515
442p070j
442r080v
443a010d
446c0212
446r0809

Rate-distortion test

ReferenceSampleRNN 5.6 kbps (emb)SampleRNN 6.4 kbps (emb)SampleRNN 6.4 kbpsSampleRNN 8.0 kbps
002o0o01
00ac031o
00ac0529
00ba0110
00bc0515
442p070j
442r080v
443a010d
446c0212
446r0809

Robustness

ReferenceAMR-WB 23.05 kbpsSILK 16SampleRNN 8.0 kbps (WSJ)SampleRNN 8.0 kbps (WSJ + VCTK)
002o0o01
00ac031o
00bc0515
443a010d
w31p003a004
w34p009a010
w54p042a043
w58p136a137

SampleRNN at 6.4 kbps (multiframe conditioning [post submission])

ReferenceAMR-WB 23.05 kbpsSILK 16SampleRNN 6.4 kbpsSampleRNN 6.4 kbps (mulitframe)
002o0o01
00ac031o
00ac0529
00ba0110
00bc0515
442p070j
442r080v
443a010d
446c0212
446r0809