Speech Coding (SPE-CODI)

Influence of Audio Bandwidth Reduction on Speech Emotion Recognition by Human Subjects

Audio bandwidth reduction is the very first processing step in any speech and audio coder. Its negative influence on perceived speech quality and intelligibility has been thoroughly studied and is therefore very well documented. This paper examines whether it also has an influence on speech emotion recognition by human subjects. Several standard telephony bandwidths, from fullband down to narrowband, are considered. Listening test results show that recognition accuracy decreases with audio bandwidth.

GlobalSIP-2017_Lahaie_Presentation.pdf

GlobalSIP-2017_Lahaie_Presentation.pdf (757)

Categories:: Speech Coding (SPE-CODI)

39 Views

NON-NEGATIVE TEMPORAL DECOMPOSITION REGULARIZATION WITH AN AUGMENTED LAGRANGIAN

Read more about NON-NEGATIVE TEMPORAL DECOMPOSITION REGULARIZATION WITH AN AUGMENTED LAGRANGIAN
Log in to post comments

Nonnegative matrix factorization (NMF) has recently been applied to temporal decomposition (TD) of speech spectral envelopes represented by line spectral frequencies. A couple of inherent TD constraints, which are otherwise handled as ad hoc exceptions, has also been incorporated using NMF, including LSF ordering and monotonic event functions. Here, these constraints are analyzed and a third inherent constraint is incorporated into an NMF analysis.

ica2017ntdalmposter.pdf

ICASSP2017marjonaramirezSP-P1.10 (335)

Categories:: Speech Analysis (SPE-ANLS)
Speech Coding (SPE-CODI)

21 Views

System-Compatible Robustness Improvement for New Generation DECT Decoders by G.722 Soft-Decision Decoding

The ITU-T Recommendation G.722 about subband adaptive differential pulse code modulation (SB-ADPCM) is the mandatory wideband speech codec in the new generation digital enhanced cordless telephony (NG-DECT). Although in ADPCM the difference signal instead of the original signal is quantized and adaptive prediction is employed, redundancy is yet observed within the quantized samples. In this paper we apply a soft-decision speech decoding technique which exploits this redundancy in terms of a priori knowledge and the channel reliability information to NG-DECT.

poster.pdf

poster.pdf (1366)

Categories:: Speech Coding (SPE-CODI)
Speech Enhancement (SPE-ENHA)

18 Views

Sparse Reconstruction of Quantized Speech Signals

Read more about Sparse Reconstruction of Quantized Speech Signals
Log in to post comments

We propose sparse reconstruction techniques to improve the quality and / or reduce the bit-rate of standard speech coders. To that end, we assume signal sparsity in some transform domain and formulate the problem of reconstructing the original signal in terms of constrained l1-norm minimization. We use modern primal-dual methods in order to solve the resulting non-smooth convex optimization problem. Experiments show that with the proposed sparse reconstruction method the instrumentally predicted speech quality can be largely improved.

icassp_poster_brauer.pdf

icassp_poster_brauer.pdf (832)

Categories:: Speech Enhancement (SPE-ENHA)
Speech Coding (SPE-CODI)

13 Views

On the influence of quantization on the identifiability of emotions from voice coding parameters

Although emotions play a major role in voice communication, the quality of their reproduction by low bit rate voice coders has never been investigated so far. This paper shows that the emotional state of a speaker can be identified automatically, with reasonable precision and accuracy, using conventional voice coding parameters (pitch, voicing, energy and LPC coefficients). It also shows that the performance of this identification degrades when these parameters are quantized, especially at lower rates (1200 bits/s).

ICASSP2016_Robitaille (poster).pdf

ICASSP2016_Robitaille (poster).pdf (41)

Categories:: Speech Coding (SPE-CODI)

14 Views

Adaptive selection of lag-window shape for linear predictive analysis in the 3GPP EVS codec

kmhGlobalSIP2015ALW_1212a.pdf

kmhGlobalSIP2015ALW_1212a.pdf (494)

Categories:: Speech Coding (SPE-CODI)
Audio Coding

17 Views

Perceptual Long-Term Harmonic plus Noise Modeling for Speech Data Compression

Read more about Perceptual Long-Term Harmonic plus Noise Modeling for Speech Data Compression
Log in to post comments

The harmonic plus noise model (HNM) is widely used for the modeling of audio signals. In this paper, we introduce perceptual frequency masking to the 2-band HNM, developed by Stylianou et al., applied to speech signals. An auditory model is used to recognize inaudible sinusoids, which will be removed from the set of model’s parameters in order to reduce the data size for speech coding.

GlobalSIPFBA_SL.pdf

GlobalSIPFBA_SL.pdf (830)

Categories:: Speech Coding (SPE-CODI)

18 Views

Perceptual Long-Term Harmonic plus Noise Modeling for Speech Data Compression

Read more about Perceptual Long-Term Harmonic plus Noise Modeling for Speech Data Compression
Log in to post comments

GlobalSIPFBA_SL.pdf

GlobalSIPFBA_SL.pdf (840)

Categories:: Speech Coding (SPE-CODI)

15 Views

Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec

This article presents a low bit-rate super wideband MDCT coder, which is adopted as a part of the recently standardized codec for Enhanced Voice Services. To maximize codec performance at 13.2 kbps, existing algorithms are reviewed and several new tools are introduced into the low bit-rate MDCT coder to improve the performance of the coder while coding music and mixed content. A subjective listening test demonstrates the advantage of the proposed system for 13.2 kbps when compared to AMR-WB+.

GlobalSIP2015_r1.6.pdf

GlobalSIP2015_r1.6.pdf (873)

GlobalSIP2015_r1.6.pdf

GlobalSIP2015_r1.6.pdf (782)

Categories:: Speech Coding (SPE-CODI)

14 Views

Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec

Read more about Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec
Log in to post comments

This presentation describes the bandwidth extension (BWE) method developed for the AMR-WB interoperable (AMR-WB IO) modes of the 3GPP EVS codec. The low-band signal (0-6.4 kHz) is coded using an enhanced version of ACELP as in AMR-WB and post-processed; the high-band (above 6.4 kHz) in contrast to AMR-WB is represented with a new BWE method. The decoded low-band excitation is adaptively extended to high frequencies and filtered in the DCT domain. The extended excitation is scaled by subframe gains and shaped by a weighted LPC synthesis filter.

EVS_AMR_WB_IO_BWE_GlobalSIP2015.ppt

EVS_AMR_WB_IO_BWE_GlobalSIP2015.ppt (577)

Categories:: Audio Coding
Speech Coding (SPE-CODI)

202 Views

Speech Coding (SPE-CODI)

Pages