- Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
- General Topics in Speech Recognition (SPE-GASR)
- Large Vocabulary Continuous Recognition/Search (SPE-LVCR)
- Lexical Modeling and Access (SPE-LEXI)
- Multilingual Recognition and Identification (SPE-MULT)
- Resource constrained speech recognition (SPE-RCSR)
- Robust Speech Recognition (SPE-ROBU)
- Speaker Recognition and Characterization (SPE-SPKR)
- Speech Adaptation/Normalization (SPE-ADAP)
- Speech Analysis (SPE-ANLS)
- Speech Coding (SPE-CODI)
- Speech Enhancement (SPE-ENHA)
- Speech Perception and Psychoacoustics (SPE-SPER)
- Speech Production (SPE-SPRD)
- Speech Synthesis and Generation, including TTS (SPE-SYNT)
Automatic emotion recognition from speech is a challenging task which relies heavily on the effectiveness of the speech features used for classification. In this work, we study the use of deep learning to automatically discover emotionally relevant features from speech. It is shown that using a deep recurrent neural network, we can learn both the short-time frame-level acoustic features that are emotionally relevant, as well as an appropriate temporal aggregation of those features into a compact utterance-level representation.
Semantic role labeling (SRL) is a task to as- sign semantic role labels to sentence elements. This pa- per describes the initial development of an Indonesian semantic role labeling system and its application to extract event information from Tweets. We compare two feature types when designing the SRL systems: Word-to-Word and Phrase-to-Phrase. Our experiments showed that the Word- to-Word feature approach outperforms the Phrase-to-Phrase approach. The application of the SRL system to an event extraction problem resulted overlap-based accuracy of 0.94 for the actor identification.
Natural and synthesized speech in L2 Mandarin produced by American English learners was evaluated by native Mandarin speakers to identify focus status and rate the naturalness of the speech. The results reveal that natural speech was recognized and rated better than synthesized speech, early learners’ speech better than late learners’ speech, focused sentences better than no-focus sentences, and initial focus and medial focus better than final focus. Tones of in-focus words interacted with focus status of the sentence and speaker group.
Aphasia is a type of acquired language impairment caused by brain injury. This paper presents an automatic speech recog- nition (ASR) based approach to objective assessment of apha- sia patients. A dedicated ASR system is developed to facilitate acoustical and linguistic analysis of Cantonese aphasia speech. The acoustic models and the language models are trained with domain- and style-matched speech data from unimpaired con- trol speakers. The speech recognition performance of this sys- tem is evaluated on natural oral discourses from patients with various types of aphasia.
Template based automatic segmentation of unit-database for TTS into phonetic and syllabic units.