Speech Processing

Attentive Adversarial Learning for Domain-Invariant Training

Read more about Attentive Adversarial Learning for Domain-Invariant Training
Log in to post comments

Adversarial domain-invariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR). In ADIT, an auxiliary domain classifier takes in equally-weighted deep features from a deep neural network (DNN) acoustic model and is trained to improve their domain-invariance by optimizing an adversarial loss function.

aadit_poster.pptx

aadit_poster.pptx (483)

Categories:: Audio and Acoustic Signal Processing
Speech Processing
Machine Learning for Signal Processing

19 Views

Adversarial Speaker Verification

Read more about Adversarial Speaker Verification
Log in to post comments

The use of deep networks to extract embeddings for speaker recognition has proven successfully. However, such embeddings are susceptible to performance degradation due to the mismatches among the training, enrollment, and test conditions. In this work, we propose an adversarial speaker verification (ASV) scheme to learn the condition-invariant deep embedding via adversarial multi-task training. In ASV, a speaker classification network and a condition identification network are jointly optimized to minimize the speaker classification loss and simultaneously mini-maximize the condition loss.

asv_poster_v3.pptx

asv_poster_v3.pptx (479)

Categories:: Speech Processing
Audio and Acoustic Signal Processing
Machine Learning for Signal Processing

18 Views

Conditional Teacher-Student Learning

Read more about Conditional Teacher-Student Learning
Log in to post comments

The teacher-student (T/S) learning has been shown to be effective for a variety of problems such as domain adaptation and model compression. One shortcoming of the T/S learning is that a teacher model, not always perfect, sporadically produces wrong guidance in form of posterior probabilities that misleads the student model towards a suboptimal performance.

cts_poster.pptx

cts_poster.pptx (491)

Categories:: Speech Processing
Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Robust Speech Recognition (SPE-ROBU)
Machine Learning for Signal Processing
Audio and Acoustic Signal Processing

50 Views

Role Specific Lattice Rescoring for Speaker Role Recognition from Speech Recognition Outputs

The language patterns followed by different speakers who play specific roles in conversational interactions provide valuable cues for the task of Speaker Role Recognition (SRR). Given the speech signal, existing algorithms typically try to find such patterns in the output of an Automatic Speech Recognition (ASR) system. In this work we propose an alternative way of revealing role-specific linguistic characteristics, by making use of role-specific ASR outputs, which are built by suitably rescoring the lattice produced after a first pass of ASR decoding.

poster_v2.pdf

RoleSpecificLatticeRescoringICASSP19 (372)

Categories:: Speech Processing

23 Views

Retrieving Speech Samples with Similar Emotional Content Using a Triplet Loss Function

The ability to identify speech with similar emotional content is valuable to many applications, including speech retrieval, surveillance, and emotional speech synthesis. While current formulations in speech emotion recognition based on classification or regression are not appropriate for this task, solutions based on preference learning offer appealing approaches for this task. This paper aims to find speech samples that are emotionally similar to an anchor speech sample provided as a query. This novel formulation opens interesting research questions.

poster_draft_final.pdf

poster_draft_final.pdf (462)

Categories:: Speech Processing

23 Views

SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS  IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS 

Audio-signal acquisition as part of wearable sensing adds an important dimension for applications such as understanding human behaviors. As part of a large study on work place behaviors, we collected audio data from individual hospital staff using custom wearable recorders. The audio features collected were limited to preserve privacy of the interactions in the hospital. A first step towards audio processing is to identify the foreground speech of the person wearing the audio badge.

ICASSP 2019 poster 34*26in new.pdf

SPEAKER AGNOSTIC FOREGROUND SPEECH DETECTION FROM AUDIO RECORDINGS  IN WORKPLACE SETTINGS FROM WEARABLE RECORDERS  (507)

Categories:: Audio and Acoustic Signal Processing
Speech Processing

28 Views

MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION

Read more about MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
Log in to post comments

Poster_for_multiband_PIT.pdf

Poster_for_multiband_PIT.pdf (424)

Categories:: Audio and Acoustic Signal Processing
Speech Processing

21 Views

ICASSP 2019 Poster (TRANSMISSION LINE COCHLEAR MODEL BASED AM-FM FEATURES FOR REPLAY ATTACK DETECTION)

ICASSP2019_Poster_TharshiniGunendradasan.pdf

ICASSP2019_Poster_TharshiniGunendradasan.pdf (514)

Categories:: Speech Processing
Design and Implementation of Signal Processing Systems

41 Views

PhoneSpoof: A new dataset for spoofing attack detection in telephone channel

Read more about PhoneSpoof: A new dataset for spoofing attack detection in telephone channel
Log in to post comments

The results of spoofing detection systems proposed during ASVspoof Challenges 2015 and 2017 confirmed the perspective in detection of unforseen spoofing trials in microphone channel. However, telephone channel presents much more challenging conditions for spoofing detection, due to limited bandwidth, various coding standards and channel effects. Research on the topic has thus far only made use of program codecs and other telephone channel emulations. Such emulations does not quite match the real telephone spoofing attacks.

ICASSP-2019.pdf

ICASSP-2019.pdf (461)

Categories:: Speech Processing

70 Views

Comparison of speech tasks for automatic classification of patients with amyotrophic lateral sclerosis and healthy subjects

In this work, we consider the task of acoustic and articulatory feature based automatic classification of Amyotrophic Lateral Sclerosis (ALS) patients and healthy subjects using speech tasks. In particular, we compare the roles of different types of speech tasks, namely rehearsed speech, spontaneous speech and repeated words for this purpose. Simultaneous articulatory and speech data were recorded from 8 healthy controls and 8 ALS patients using AG501 for the classification experiments.

ICASSP_Final_April21.pdf

ICASSP_Final_April21.pdf (683)

Categories:: Speech Processing

13 Views

Speech Processing

Pages