Speech Analysis (SPE-ANLS)

FRAUG: A FRAME RATE BASED DATA AUGMENTATION METHOD FOR DEPRESSION DETECTION FROM SPEECH SIGNALS

icassp_2381_fraug_slides.pptx

icassp_2381_fraug_slides.pptx (593)

Categories:: Speech Analysis (SPE-ANLS)

61 Views

Constant Q Cepstral Coefficients for Normal vs. Pathological Infant Cry

Read more about Constant Q Cepstral Coefficients for Normal vs. Pathological Infant Cry
Log in to post comments

icassp_2022_infant_cry_ppt.pdf

classification approach for normal vs. pathological infant cry (182)

Categories:: Speech Analysis (SPE-ANLS)

27 Views

Multimodal Depression Classification Using Articulatory Coordination Features and Hierarchical Attention Based Text Embeddings

Multimodal depression classification has gained immense popularity over the recent years. We develop a multimodal depression classification system using articulatory coordination features extracted from vocal tract variables and text transcriptions obtained from an automatic speech recognition tool that yields improvements of area under the receiver operating characteristics curve compared to unimodal classifiers (7.5% and 13.7% for audio and text respectively).

3649_poster.pdf

Poster (233)

Categories:: Speech Analysis (SPE-ANLS)
Speech Production (SPE-SPRD)

28 Views

SERAB: A MULTI-LINGUAL BENCHMARK FOR SPEECH EMOTION RECOGNITION

Read more about SERAB: A MULTI-LINGUAL BENCHMARK FOR SPEECH EMOTION RECOGNITION
Log in to post comments

The Speech Emotion Recognition Adaptation Benchmark (SERAB) is a new framework to evaluate the performance and generalization capacity of different approaches for utterance-level SER. The benchmark is composed of nine datasets for SER in six languages. We used the proposed framework to evaluate a selection of standard hand-crafted feature sets and state-of-the-art DNN representations. The results highlight that using only a subset of the data included in SERAB can result in biased evaluation, while compliance with the proposed protocol can circumvent this issue.

4461_SERAB_poster_ScheidwasserKeglerBeckmannCernak_ICASSP2022.pdf

Poster (203)

Categories:: Speech Analysis (SPE-ANLS)
Neural network learning (MLR-NNLR)

17 Views

Automatic Assessment of the Degree of Clinical Depression from Speech Using X-Vectors

Read more about Automatic Assessment of the Degree of Clinical Depression from Speech Using X-Vectors
Log in to post comments

Depression is a frequent and curable psychiatric disorder, detrimentally affecting daily activities, harming both work-place productivity and personal relationships. Among many other symptoms, depression is associated with disordered
speech production, which might permit its automatic screening by means of the speech of the subject. However, the choice of actual features extracted from the recordings is not trivial. In this study, we employ x-vectors, a DNN-based

icassp2022-José-Egas.pptx

icassp2022-José-Egas.pptx (192)

Categories:: Speech Analysis (SPE-ANLS)

22 Views

AN ATTENTION MODEL FOR HYPERNASALITY PREDICTION IN CHILDREN WITH CLEFT PALATE

Read more about AN ATTENTION MODEL FOR HYPERNASALITY PREDICTION IN CHILDREN WITH CLEFT PALATE
Log in to post comments

Hypernasality refers to the perception of abnormal nasal resonances in vowels and voiced consonants. Estimation of hypernasality severity from connected speech samples involves learning a mapping between the frame-level features and utterance-level clinical ratings of hypernasality. However, not all speech frames contribute equally to the perception of hypernasality.

PaperID3812_Vikram_poster.pdf

PaperID3812_Vikram_poster.pdf (271)

Categories:: Speech Analysis (SPE-ANLS)

11 Views

Exploiting Vocal Tract Coordination Using Dilated CNNs for Depression Detection in Naturalistic Environments

Depression detection from speech continues to attract significant research attention but remains a major challenge, particularly when the speech is acquired from diverse smartphones in natural environments. Analysis methods based on vocal tract coordination have shown great promise in depression and cognitive impairment detection for quantifying relationships between features over time through eigenvalues of multi-scale cross-correlations.

ICASSP2020_Huang_V01_uploaded.pdf

presentation slides (379)

Categories:: Speech Analysis (SPE-ANLS)

57 Views

VOICE BASED CLASSIFICATION OF PATIENTS WITH AMYOTROPHIC LATERAL SCLEROSIS, PARKINSON'S DISEASE AND HEALTHY CONTROLS WITH CNN-LSTM USING TRANSFER LEARNING

In this paper, we consider 2-class and 3-class classification problems for classifying patients with Amyotropic Lateral Sclerosis (ALS), Parkinson’s Disease (PD) and Healthy Controls (HC) using a CNN-LSTM network. Classification performance is examined for three different tasks, namely, Spontaneous speech (SPON), Diadochoki-netic rate (DIDK) and Sustained Phonation (PHON). Experiments are conducted using speech data recorded from 60 ALS, 60 PD and60 HC subjects. Classification using SVM and DNN are considered baseline schemes.

ICASSP2020_presentation_slides.pdf

ICASSP2020_presentation_slides.pdf (471)

Categories:: Speech Analysis (SPE-ANLS)

74 Views

Ensemble feature selection for domain adaptation in speech emotion recognition

Read more about Ensemble feature selection for domain adaptation in speech emotion recognition
Log in to post comments

When emotion recognition systems are used in new domains, the classification performance usually drops due to mismatches between training and testing conditions. Annotations of new data in the new domain is expensive and time demanding. Therefore, it is important to design strategies that efficiently use limited amount of new data to improve the robustness of the classification system. The use of ensembles is an attractive solution, since they can be built to perform well across different mismatches. The key challenge is to create ensembles that are diverse.

ICASSP_2017_Ensemble.pdf

ICASSP_2017_Ensemble.pdf (418)

Categories:: Speech Analysis (SPE-ANLS)

9 Views

Incremental adaptation using active learning for acoustic emotion recognition

Read more about Incremental adaptation using active learning for acoustic emotion recognition
Log in to post comments

The performance of speech emotion classifiers greatly degrade when the training conditions do not match the testing conditions. This problem is observed in cross-corpora evaluations, even when the corpora are similar. The lack of generalization is particularly problematic when the emotion classifiers are used in real applications. This study addresses this problem by combining active learning (AL) and supervised domain adaptation (DA) using an elegant approach for support vector machine (SVM).

Poster-CB.pdf

Poster-CB.pdf (414)

Categories:: Speech Analysis (SPE-ANLS)

16 Views

Speech Analysis (SPE-ANLS)

Pages