Sorry, you need to enable JavaScript to visit this website.

Low-bitrate redundancy coding of speech for packet loss concealment in teleconferencing

DOI:
10.60864/fpmd-wn89
Citation Author(s):
Marcin Ciołek, Michał Sulewski, Mihailo Kolundzija, Rafał Pilarczk, Raul Casas, Samer Hijazi
Submitted by:
Marcin Ciolek
Last updated:
6 June 2024 - 10:54am
Document Type:
Poster
Document Year:
2024
Event:
Presenters:
Marcin Ciolek
Paper Code:
DEMO-1A.2:
Categories:
 

conferencing applications. We introduced a novel neural codec for low-bitrate speech coding at 6 kbit/s, with long 1 kbit/s redundancy, that also enhances speech by suppressing noise and reverberation. Transmitting large amounts of redundant information allows for speech reconstruction on the receiver side during severe packet loss – see ICASSP paper ID 7175: “Ultra low bitrate loss resilient neural speech enhancing codec”.

The novelty of the proposed demo is combining the neural codec with the Viterbi algorithm and entropy coding to compress redundant information by 45% down to ~0.55 kbit/s with minimum loss in audio quality. The codec comprises three neural components: encoder, vector quantizer, and decoder. The vector quantizer outputs a sequence of symbols. High compression is achieved by applying entropy coding to the sequence of symbols modified by the Viterbi algorithm. The efficiency of the proposed scheme comes from incorporating transition probabilities between symbols. Objective and subjective metrics confirmed a minor difference in audio quality between the 1 kbit/s and 0.55 kbit/s schemes.

The demo is related to the ICASSP Audio Deep Packet Loss Concealment Grand Challenge and fits the theme of Signal Processing - the Foundation of True Intelligence. The interaction with the ICASSP audience will be based on playing a live scene with a packet loss simulated on captured audio and demonstrating the capability of the proposed codec and compression scheme to mend speech during long packet losses.

The demo software will process the noisy audio recorded on the spot. Next, everyone can listen to their enhanced voice transmitted through a lossy network channel and compare it to the input. Our demo will be supported with a poster and a videocast. We want to emphasize that the demo prepared by our R&D team isn’t a commercial product.

up
0 users have voted: