Human Language Technology

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change.

ICASSP2020_YJEONG_SLIDES.pdf

ICASSP2020_YJEONG_SLIDES.pdf (352)

Categories:: General Topics in Speech Recognition (SPE-GASR)

24 Views

Learning Motion Disfluencies for Automatic Sign Language Segmentation

Read more about Learning Motion Disfluencies for Automatic Sign Language Segmentation
Log in to post comments

We introduce a novel technique for the automatic detection of word boundaries within continuous sentence expressions in Japanese Sign Language from three-dimensional body joint positions. First, the flow of signed sentence data within a temporal neighborhood is determined utilizing the spatial correlations between line segments of inter-joint pairs. Next, a frame-wise binary random forest classifier is trained to distinguish word and non-word frame content based on the extracted spatio-temporal features.

Poster.pdf

Poster.pdf (528)

Categories:: Pattern recognition and classification (MLR-PATT)
Image, Video, and Multidimensional Signal Processing

36 Views

Whole Sentence Neural Language Model

Read more about Whole Sentence Neural Language Model
Log in to post comments

Recurrent neural networks have become increasingly popular for the task of language modeling achieving impressive gains in state-of-the-art speech recognition and natural language processing (NLP) tasks. Recurrent models exploit word dependencies over a much longer context window (as retained by the history states) than what is feasible with n-gram language models.

whole-sent-v3.pdf

whole sentence neural language model (664)

Categories:: Audio and Acoustic Signal Processing

95 Views

END-TO-END NEURAL NETWORK BASED AUTOMATED SPEECH SCORING

Read more about END-TO-END NEURAL NETWORK BASED AUTOMATED SPEECH SCORING
Log in to post comments

icassp2018_final.pdf

icassp2018_final.pdf (500)

Categories:: Audio and Acoustic Signal Processing

41 Views