Speech Retrieval (SLP-IR)

Video-Driven Speech Reconstruction - Show & Tell Demo

Read more about Video-Driven Speech Reconstruction - Show & Tell Demo
Log in to post comments

This demo will showcase our video-to-audio model which attempts to reconstruct speech from short videos of spoken statements. Our model does so in a completely end-to-end manner where raw audio is generated based on the input video. This approach bypasses the need for separate lip-reading and text-to-speech models. The advantage of such an approach is that it does not require large transcribed datasets and it is not based on intermediate representations like text which remove any intonation and emotional content from the speech.

Video-driven Speech Reconstruction using Generative Adversarial Networks Show & Tell Demo.pdf

Video-driven Speech Reconstruction using Generative Adversarial Networks Show & Tell Demo.pdf (651)

Categories:: Image/Video Processing
Speech Retrieval (SLP-IR)

105 Views

QUERY-BY-EXAMPLE SPOKEN TERM DETECTION USING ATTENTION-BASED MULTI-HOP NETWORKS

Read more about QUERY-BY-EXAMPLE SPOKEN TERM DETECTION USING ATTENTION-BASED MULTI-HOP NETWORKS
Log in to post comments

2018poster_icassp.pdf

2018poster_icassp.pdf (607)

Categories:: Speech Retrieval (SLP-IR)

16 Views

SEGMENTAL AUDIO WORD2VEC: REPRESENTING UTTERANCES AS SEQUENCES OF VECTORS WITH APPLICATIONS IN SPOKEN TERM DETECTION

poster.pdf

poster.pdf (773)

Categories:: Speech Retrieval (SLP-IR)

15 Views

An LSTM-CTC based verification system for proxy-word based OOV keyword search

Read more about An LSTM-CTC based verification system for proxy-word based OOV keyword search
Log in to post comments

An LSTM-CTC based verification system for proxy-word based OOV keyword search.pptx

An LSTM-CTC based verification system for proxy-word based OOV keyword search.pptx (21)

Categories:: Speech Retrieval (SLP-IR)

10 Views

An LSTM-CTC based verification system for proxy-word based OOV keyword search

Read more about An LSTM-CTC based verification system for proxy-word based OOV keyword search
Log in to post comments

An LSTM-CTC based verification system for proxy-word based OOV keyword search.pptx

An LSTM-CTC based verification system for proxy-word based OOV keyword search.pptx (37)

Categories:: Speech Retrieval (SLP-IR)

11 Views

Pairwise Learning using Multi-lingual Bottleneck Features for Low-resource Query-by-example Spoken Term Detection

We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages. The multi-lingual BNFs of the word pairs are used as an initial feature representation to train an autoencoder (AE).

ICASSP2017_oral_ygyuan.pdf

ICASSP2017_oral_ygyuan.pdf (754)

Categories:: Speech Retrieval (SLP-IR)

41 Views

Exploiting noisy web data by OOV ranking for low-resource keyword search

Read more about Exploiting noisy web data by OOV ranking for low-resource keyword search
Log in to post comments

Spoken keyword search in low-resource condition suffers from out-of-vocabulary (OOV) problem and insufficient text data for language model (LM) training. Web-crawled text data is used to expand vocabulary and to augment language model. However, the mismatching between web text and the target speech data brings difficulties to effective utilization. New words from web data need an evaluation to exclude noisy words or introduce proper probabilities. In this paper, several criteria to rank new words from web data are investigated and are used as features

ISCSLP2016_Poster_Exploiting.pdf

ISCSLP2016_Poster_Exploiting.pdf (884)

Categories:: Language Modeling, for Speech and SLP (SLP-LANG)
Speech Retrieval (SLP-IR)

8 Views

KEYWORD SEARCH USING QUERY EXPANSION FOR GRAPH-BASED RESCORING OF HYPOTHESIZED DETECTIONS

In this work, we propose a novel framework for rescoring keyword search (KWS) detections using acoustic samples extracted from the training data. We view the keyword rescoring task as an information retrieval task and adopt the idea of query expansion. We expand a textual keyword with multiple speech keyword samples extracted from the training data. In this way, the hypothesized detections are compared with the multiple keywords using non-parametric approaches such as dynamic time warping (DTW).

icassp2016_with_note_v3.pdf

icassp2016_with_note_v3.pdf (878)

Categories:: Speech Retrieval (SLP-IR)

9 Views