Speech Processing

PARALLEL-DATA-FREE DICTIONARY LEARNING FOR VOICE CONVERSION USING NON-NEGATIVE TUCKER DECOMPOSITION

icassp2018_poster_takashima.pdf

0005294_poster (548)

Categories:: Speech Processing

13 Views

ON THE IMPORTANCE OF ANALYTIC PHASE OF SPEECH SIGNALS IN SPOKEN LANGUAGE RECOGNITION

Read more about ON THE IMPORTANCE OF ANALYTIC PHASE OF SPEECH SIGNALS IN SPOKEN LANGUAGE RECOGNITION
Log in to post comments

In this paper, we study the role of long-time analytic phase of speech
signals in spoken language recognition (SLR) and employ a set
of features termed as instantaneous frequency cepstral coefficients
(IFCC). We extract IFCC from long-time analytic phase, in an effort
to capture long range acoustic features from speech signals. These
features are used in combination with the traditional shifted delta
cepstral coefficients (SDCC) for SLR. As the SDCC are extracted
from spectral magnitude and IFCC are from analytic phase, they

LRE_poster.pdf

LRE_poster.pdf (539)

Categories:: Speech Processing

20 Views

An Instrumental Intelligibility Metric Based on Information Theory

Read more about An Instrumental Intelligibility Metric Based on Information Theory

kuykkleijnhendriks2018 (3).pdf

kuykkleijnhendriks2018 (3).pdf (572)

Categories:: Speech Processing

14 Views

CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH

Read more about CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
Log in to post comments

poster_final.pdf

poster_final.pdf (1002)

Categories:: Speech Processing

33 Views

3-D CNN Models FOR FAR-FIELD MULTI-CHANNEL Speech Recognition

Read more about 3-D CNN Models FOR FAR-FIELD MULTI-CHANNEL Speech Recognition
Log in to post comments

conference_poster_5.pdf

conference_poster_5.pdf (757)

Categories:: Speech Processing

18 Views

DYNAMIC MULTI-RATER GAUSSIAN MIXTURE REGRESSION INCORPORATING TEMPORAL DEPENDENCIES OF EMOTION UNCERTAINTY USING KALMAN FILTERS

Predicting continuous emotion in terms of affective attrib-utes has mainly been focused on hard labels, which ignored the ambiguity of recognizing certain emotions. This ambigu-ity may result in high inter-rater variability and in turn caus-es varying prediction uncertainty with time. Based on the assumption that temporal dependencies occur in the evolu-tion of emotion uncertainty, this paper proposes a dynamic multi-rater Gaussian Mixture Regression (GMR), aiming to obtain the emotion uncertainty prediction reflected by multi-raters by taking into account their temporal dependencies.

icassp2018_tingdang_pdf.pdf

icassp2018_tingdang_pdf.pdf (606)

Categories:: Speech Processing

29 Views

COMPLEX-VALUED GAUSSIAN PROCESS LATENT VARIABLE MODEL FOR PHASE-INCORPORATING SPEECH ENHANCEMENT

ICASSP2018speech_poster.pdf

ICASSP2018speech_poster.pdf (537)

Categories:: Speech Processing

4 Views

COMPLEX-VALUED GAUSSIAN PROCESS LATENT VARIABLE MODEL FOR PHASE-INCORPORATING SPEECH ENHANCEMENT

ICASSP2018speech_poster.pdf

ICASSP2018speech_poster.pdf (676)

Categories:: Speech Processing

18 Views

AUTOMATIC SPEECH ASSESSMENT FOR APHASIC PATIENTS BASED ON SYLLABLE-LEVEL EMBEDDING AND SUPRA-SEGMENTAL DURATION FEATURES

Aphasia is a type of acquired language impairment resulting from brain injury. Speech assessment is an important part of the comprehensive assessment process for aphasic patients. It is based on the acoustical and linguistic analysis of patients’ speech elicited through pre-defined story-telling tasks. This type of narrative spontaneous speech embodies multi-fold atypical characteristics related to the underlying language impairment.

poster_QinYing_ICASSP2018_final.pdf

poster_QinYing_ICASSP2018_final.pdf (653)

Categories:: Speech Processing

7 Views

FACTORIZED HIDDEN VARIABILITY LEARNING FOR ADAPTATION OF SHORT DURATION LANGUAGE IDENTIFICATION MODELS

Bidirectional long short term memory (BLSTM) recurrent neural networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and deep neural networks (DNNs) in automatic language identification (LID), particularly when testing with very short utterances (∼3s). Mismatches conditions between training and test data, e.g. speaker, channel, duration and environmental noise, are a major source of performance degradation for LID.

POSTER.pdf

POSTER.pdf (1830)

Categories:: Audio and Acoustic Signal Processing
Speech Processing
Spoken Language Processing

11 Views

Speech Processing

Pages