Speech Processing

Neural Networks Optimally Compress the Sawbridge

Read more about Neural Networks Optimally Compress the Sawbridge
Log in to post comments

talk.pdf

talk.pdf (364)

Categories:: Speech Processing

36 Views

MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION

The paper presents a Multi-Head Attention deep learning network for Speech Emotion Recognition (SER) using Log mel-Filter Bank Energies (LFBE) spectral features as the input. The multi-head attention along with the position embedding jointly attends to information from different representations of the same LFBE input sequence. The position embedding helps in attending to the dominant emotion features by identifying positions of the features in the sequence. In addition to Multi-Head Attention and position embedding, we apply multi-task learning with gender recognition as an auxiliary task.

ICASSP.pdf

ICASSP.pdf (600)

Categories:: Speech Processing

145 Views

DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION

ICASSP_presentation_5598.pdf

ICASSP_presentation_5598.pdf (531)

Categories:: Speech Processing

72 Views

Multi-Conditioning & Data Augmentation using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions

Degradation due to additive noise is a significant road block in the real-life deployment of Speech Emotion Recognition (SER) systems. Most of the previous work in this field dealt with the noise degradation either at the signal or at the feature level. In this paper, to address the robustness aspect of the SER in additive noise scenarios, we propose multi-conditioning and data augmentation using an utterance level parametric generative noise model. The generative noise model is designed to generate noise types which can span the entire noise space in the mel-filterbank energy domain.

ICASSP2020_ppt_5701.pdf

ICASSP2020_ppt_5701.pdf (674)

Categories:: Speech Processing

95 Views

Generating and Protecting Against Adversarial Attacks for Deep Speech-based Emotion Recognition Models

ICASSP_slides_ZhaoRen.pdf

ICASSP_slides_ZhaoRen.pdf (495)

Categories:: Speech Processing

44 Views

Defense against adversarial attacks on spoofing countermeasures of ASV

Read more about Defense against adversarial attacks on spoofing countermeasures of ASV
1 comment
Log in to post comments

Various spearheads countermeasure methods for automatic speaker veriﬁcation (ASV) with considerable performance for anti-spooﬁng are proposed in ASVspoof 2019 challenge. However, previous work has shown that countermeasure models are subject to adversarial examples indistinguishable from natural data. A good countermeasure model should not only be robust to spooﬁng audio, including synthetic, converted, and replayed audios, but counter deliberately generated examples by malicious adversaries.

ICASSP REPORT.pdf

ICASSP REPORT.pdf (318)

Categories:: Speech Processing

42 Views

END-TO-END ARTICULATORY MODELING FOR DYSARTHRIC ARTICULATORY ATTRIBUTE DETECTION

Read more about END-TO-END ARTICULATORY MODELING FOR DYSARTHRIC ARTICULATORY ATTRIBUTE DETECTION
Log in to post comments

In this study, we focus on detecting articulatory attribute errors for dysarthric patients with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS). There are two major challenges for this task. The pronunciation of dysarthric patients is unclear and inaccurate, which results in poor performances of traditional automatic speech recognition (ASR) systems and traditional automatic speech attribute transcription (ASAT). In addition, the data is limited because of the difficulty of recording.

ICASSP2020_poster_lin.pdf

ICASSP2020_poster_lin.pdf (354)

Categories:: Speech Processing

41 Views

A CLASSIFICATION-AIDED FRAMEWORK FOR NON-INTRUSIVE SPEECH QUALITY ASSESSMENT

Read more about A CLASSIFICATION-AIDED FRAMEWORK FOR NON-INTRUSIVE SPEECH QUALITY ASSESSMENT
Log in to post comments

Objective metrics, such as the perceptual evaluation of speech quality (PESQ) have become standard measures for evaluating speech. These metrics enable efficient and costless evaluations, where ratings are often computed by comparing a degraded speech signal to its underlying clean reference signal. Reference-based metrics, however, cannot be used to evaluate real-world signals that have inaccessible references. This project develops a nonintrusive framework for evaluating the perceptual quality of noisy and enhanced speech.

WASPAA.v3.pdf

WASPAA.v3.pdf (706)

Categories:: Speech Processing

326 Views

Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech

Read more about Speech Landmark Bigrams for Depression Detection from Naturalistic Smartphone Speech
Log in to post comments

Detection of depression from speech has attracted significant research attention in recent years but remains a challenge, particularly for speech from diverse smartphones in natural environments. This paper proposes two sets of novel features based on speech landmark bigrams associated with abrupt speech articulatory events for depression detection from smartphone audio recordings. Combined with techniques adapted from natural language text processing, the proposed features further exploit landmark bigrams by discovering latent articulatory events.

ICASSP2019_Huang_V01_uploaded.pdf

ICASSP2019_Huang_V01_uploaded.pdf (795)

Categories:: Speech Processing

83 Views

Adversarial Speaker Adaptation

Read more about Adversarial Speaker Adaptation
Log in to post comments

We propose a novel adversarial speaker adaptation (ASA) scheme, in which adversarial learning is applied to regularize the distribution of deep hidden features in a speaker-dependent (SD) deep neural network (DNN) acoustic model to be close to that of a fixed speaker-independent (SI) DNN acoustic model during adaptation. An additional discriminator network is introduced to distinguish the deep features generated by the SD model from those produced by the SI model.