Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Utterance level permutation invariant training (uPIT) tech- nique is a state-of-the-art deep learning architecture for speaker independent multi-talker separation. uPIT solves the label ambiguity problem by minimizing the mean square error (MSE) over all permutations between outputs and tar- gets. However, uPIT may be sub-optimal at segmental level because the optimization is not calculated over the individual frames. In this paper, we propose a constrained uPIT (cu- PIT) to solve this problem by computing a weighted MSE loss using dynamic information (i.e., delta and acceleration).

Categories:
86 Views

Active graph-based semi-supervised learning (AG-SSL) aims to select a small set of labeled examples and utilize their graph-based relation to other unlabeled examples to aid in machine learning tasks. It is also closely related to the sampling theory in graph signal processing. In this paper, we revisit the original formulation of graph-based SSL and prove the supermodularity of an AG-SSL objective function under a broad class of regularization functions parameterized by Stieltjes matrices.

Categories:
80 Views

The most state-of-art time-difference-of-arrival (TDOA) localization algorithms are performed under the assumption that all the nodes are synchronized. However, for a widely distributed wireless sensor networks (WSNs), time synchronization between all the nodes is not a trival problem. In this paper, we study the problem of source localization using signal TDOA measurements in the system of nodes part synchronization. Starting from the maximum likelihood estimator (MLE), we develop a semidefinite programming (SDP) approach.

Categories:
27 Views

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks. We combine the use of subword features (letter ngrams) and one-hot encoding of frequent words so that the models can handle large vocabularies containing infrequent words. We propose a new objective function that allows for training of unnormalized probabilities. An importance sampling based method is supported to speed up training when the vocabulary is large.

Categories:
48 Views

Lattice-rescoring is a common approach to take advantage of recurrent neural language models in ASR, where a wordlattice is generated from 1st-pass decoding and the lattice is then rescored with a neural model, and an n-gram approximation method is usually adopted to limit the search space. In this work, we describe a pruned lattice-rescoring algorithm for ASR, improving the n-gram approximation method. The pruned algorithm further limits the search space and uses heuristic search to pick better histories when expanding the lattice.

Categories:
34 Views

Pages