Audio Coding

ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION

Informed source separation (ISS) uses source separation for extracting audio objects out of their downmix given some pre-computed parameters. In recent years, non-negative tensor factorization (NTF) has proven to be a good choice for compressing audio objects at an encoding stage. At the decoding stage, these parameters are used to separate the downmix with Wiener-filtering. The quantized NTF parameters have to be encoded to a bitstream prior to transmission.

BlRo2018_Poster_print.pdf

BlRo2018_Poster_print.pdf (656)

Categories:: Audio Coding
Source Separation and Signal Enhancement

14 Views

Decorrelation for Audio Object Coding

Read more about Decorrelation for Audio Object Coding
Log in to post comments

Object-based representations of audio content are increasingly
used in entertainment systems to deliver immersive and
personalized experiences. Efficient storage and transmission
of such content can be achieved by joint object coding algorithms
that convey a reduced number of downmix signals
together with parametric side information that enables object
reconstruction in the decoder. This paper presents an
approach to improve the performance of joint object coding
by adding one or more decorrelators to the decoding process.

ICASSP2017_FINAL.pdf

ICASSP2017_FINAL.pdf (714)

Categories:: Audio Coding

41 Views

Pre-Echo Noise Reduction in Frequency-Domain Audio Codecs (Poster)

Read more about Pre-Echo Noise Reduction in Frequency-Domain Audio Codecs (Poster)
Log in to post comments

One of the most common yet detrimental compression artifacts in frequency-domain audio codecs is known as pre-echo, which is perceived as a brief noise preceding transient signals, and is discernable even without direct comparison to the original signal. Because of its substantial negative impact on audio quality, many techniques have been proposed to alleviate it, but not without effect on coding efficiency.

jim-poster.pdf

Poster (268)

Categories:: Audio Coding

52 Views

Adaptive selection of lag-window shape for linear predictive analysis in the 3GPP EVS codec

kmhGlobalSIP2015ALW_1212a.pdf

kmhGlobalSIP2015ALW_1212a.pdf (494)

Categories:: Speech Coding (SPE-CODI)
Audio Coding

16 Views

A PACKET LOSS RECOVERY TECHNIQUE WITH LINE SPECTRAL FREQUENCY MODIFICATION IN 3GPP EVS CODEC

This paper examines a method for controlling the energy of decoded signal at the recovery frame from a packet loss. Our observation unveiled that a packet loss around speech onset causes sudden increase in the amplitude of the decoded signal at the recovery frame. To mitigate the artifact caused by the overshoot, a detector of the speech overshoot is proposed as well as a method that controls the amplitude of the decoded signal by adjusting distances of adjacent line spectral frequencies.

20151215_GlobalSIP2015_presentation.pdf

20151215_GlobalSIP2015_presentation.pdf (968)

Categories:: Audio Coding

14 Views

DELAY-LESS FREQUENCY DOMAIN PACKET-LOSS CONCEALMENT FOR TONAL AUDIO SIGNALS

Read more about DELAY-LESS FREQUENCY DOMAIN PACKET-LOSS CONCEALMENT FOR TONAL AUDIO SIGNALS
Log in to post comments

FD_Tonal_PLC_GlobalSIP2015.pdf

FD_Tonal_PLC_GlobalSIP2015.pdf (962)

Categories:: Audio Coding

24 Views

Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec

Read more about Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec
Log in to post comments

This presentation describes the bandwidth extension (BWE) method developed for the AMR-WB interoperable (AMR-WB IO) modes of the 3GPP EVS codec. The low-band signal (0-6.4 kHz) is coded using an enhanced version of ACELP as in AMR-WB and post-processed; the high-band (above 6.4 kHz) in contrast to AMR-WB is represented with a new BWE method. The decoded low-band excitation is adaptively extended to high frequencies and filtered in the DCT domain. The extended excitation is scaled by subframe gains and shaped by a weighted LPC synthesis filter.

EVS_AMR_WB_IO_BWE_GlobalSIP2015.ppt

EVS_AMR_WB_IO_BWE_GlobalSIP2015.ppt (577)

Categories:: Audio Coding
Speech Coding (SPE-CODI)

200 Views

A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching

This presentation describes a novel frequency domain bandwidth extension (BWE) scheme with relaxed synchronization, optimized for coding inactive and music/mixed content signals. The algorithm achieves high subjective quality at low and medium bitrates and it has a low algorithmic delay. The algorithm is part of the 3GPP Enhanced Voice Services (EVS) codec. In addition to the presented algorithm, the EVS codec employs also a time domain BWE scheme optimized for active speech coding.

EVS_FD_BWE_GlobalSIP2015.pptx

EVS_FD_BWE_GlobalSIP2015.pptx (1134)

Categories:: Audio Coding
Speech Coding (SPE-CODI)

16 Views

Audio Bandwidth Detection in the EVS Codec

Read more about Audio Bandwidth Detection in the EVS Codec
Log in to post comments

Speech and audio codecs are usually designed such that they encode all the frequency bands of the input signal spectrum. If the higher bands do not contain any perceptually meaningful content, these codecs often do not work optimally as they assign part of the available bit budget to encode these bands. In this paper we describe a bandwidth detection algorithm that determines the effective audio bandwidth of the input signal.

GlobalSIP15_BWD_presentation.ppt

GlobalSIP15_BWD_presentation.ppt (1394)

Categories:: Audio Coding
Speech Coding (SPE-CODI)

73 Views

Linear estimation based primary-ambient extraction for stereo audio signals (slides)

Read more about Linear estimation based primary-ambient extraction for stereo audio signals (slides)
Log in to post comments

Audio signals for moving pictures and video games are often linear combinations of primary and ambient components. In spatial audio analysis-synthesis, these mixed signals are usually decomposed into primary and ambient components to facilitate flexible spatial rendering and enhancement. Existing approaches such as principal component analysis (PCA) and least squares (LS) are widely used to perform this decomposition from stereo signals.