Sorry, you need to enable JavaScript to visit this website.

In this paper, we investigate an interesting problem, i.e., unsupervised cross-corpus speech emotion recognition (SER), in which the training and testing speech signals come from two different speech emotion corpora. Meanwhile, the training speech signals are labeled, while the label information of the testing speech signals is entirely unknown. Due to this setting, the training (source) and testing (target) speech signals may have different feature distributions and therefore lots of existing SER methods would not work.

Categories:
7 Views

Parkinson’s disease (PD) produces several speech impairments in the patients. Automatic classification of PD patients is performed considering speech recordings collected in non- controlled acoustic conditions during normal phone calls in a unobtrusive way. A speech enhancement algorithm is applied to improve the quality of the signals. Two different classification approaches are considered: the classification of PD patients and healthy speakers and a multi-class experiment to classify patients in several stages of the disease.

Categories:
21 Views

Negative symptoms of schizophrenia are often associated with the blunting of emotional affect which creates a serious impediment in the daily functioning of the patients. Affective prosody is almost always adversely impacted in such cases, and is known to exhibit itself through the low-level acoustic signals of prosody. To automate and simplify the process of assessment of severity of emotion related symptoms of schizophrenia, we utilized these low-level acoustic signals to predict the expert subjective ratings assigned by a trained psychologist during an interview with the patient.

Categories:
18 Views

Bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) has achieved state-of-the-art performance in many sequence processing problems given its capability in capturing contextual information. However, for languages with limited amount of training data, it is still difficult to obtain a high quality BLSTM model for emphasis detection, the aim of which is to recognize the emphasized speech segments from natural speech.

Categories:
7 Views

Aphasia is an acquired communication disorder resulting from brain damage and impairs an individual’s ability to use, produce, and comprehend language. Loss of communication skills can be stressful and may result in depression, yet most stress and depression diagnostic tools are designed for adults without aphasia. This project is a research effort to predict stress and depression from acoustic profiles of adults with aphasia using linear support-vector regression. The labels were obtained through caregiver surveys (SADQ-10) or surveys not designed for adults with aphasia (PSS).

Categories:
22 Views

Automatic syllable stress detection is useful in assessing and diagnosing the quality of the pronunciation of second language (L2) learners in an automated way. Typically, the syllable stress depends on three prominence measures -- intensity level, duration, pitch -- around the sound unit with the highest sonority in the respective syllable. Stress detection is often formulated as a binary classification task using cues from the feature contours representing the prominence measures.

Categories:
8 Views

Nonnegative matrix factorization (NMF) has recently been applied to temporal decomposition (TD) of speech spectral envelopes represented by line spectral frequencies. A couple of inherent TD constraints, which are otherwise handled as ad hoc exceptions, has also been incorporated using NMF, including LSF ordering and monotonic event functions. Here, these constraints are analyzed and a third inherent constraint is incorporated into an NMF analysis.

Categories:
9 Views

In this paper, rich prosodic information of spontaneous Mandarin speech is explored. The joint prosody labeling and modeling algorithm proposed previously for read speech is extended to spontaneous-speech prosody modeling by additionally considering the modeling of disfluency speech parts. It trains a hierarchical prosodic model and performs prosody labeling from a large speech corpus automatically. Rich prosodic information is then explored via analyzing model parameters and labeling results.

Categories:
2 Views

Pages