Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website

Code Switching refers to the phenomenon of changing languages within a sentence or discourse, and it represents a challenge for conventional automatic speech recognition systems deployed to tackle a single target language. The code switching problem is complicated by the lack of multi-lingual training data needed to build new and ad hoc multi-lingual acoustic and language models. In this work, we present a prototype research code-switching speech recognition system that leverages existing monolingual acoustic and language models, i.e., no ad hoc training is needed.

Categories:
88 Views

In video coding frameworks, the essence of intra coding is leveraging the spatial correlation within a frame to remove redundancy thus achieving compact transmitting data. With modern video acquisition devices improvement, more high-definition videos emerge into people’s lives which has set a new challenge for high-efficiency video coding. In this paper, we propose a novel intra video coding scheme based on Multiple Linear Regression (MLR), named Multiple linear regression Intra Prediction (MIP).

Categories:
9 Views

A modified nested linear array (MNLA) has been reported recently for a greater potential in increasing the degree-of-freedom. However, there exist some “holes” in the difference co-array, which results in missing “lags” and limited performance of direction-of-arrival (DOA) estimation. In order to tackle this problem, this paper applies a Toeplitz matrix completion technique to MNLA, and investigates the performance of DOA estimation on this basis. Particularly, a semidefinite program with trace minimization is derived to obtain the covariance matrix with Hermitian and Toeplitz structure.

Categories:
51 Views

This paper proposes a new speech feature representation that improves the intelligibility assessment of dysarthric speech. The formulation of the feature set is motivated from the human auditory perception and high time-frequency resolution property of single frequency filtering (SFF) technique. The proposed features are named as perceptually enhanced single frequency cepstral coefficients (PESFCC). As a part of SFF technique implementation, speech signal passed through a single pole complex bandpass filter bank to obtain high-resolution time-frequency distribution.

Categories:
51 Views

Understanding videos of people speaking across international borders is hard as audiences from different demographies do not understand the language. Such speech videos are often supplemented with language subtitles. However, these hamper the viewing experience as the attention is shared. Simple audio dubbing in a different language makes the video appear unnatural due to unsynchronized lip motion. In this paper, we propose a system for automated cross-language lip synchronization for re-dubbed videos.

Categories:
9 Views

Pages