Sorry, you need to enable JavaScript to visit this website.

Recurrent neural networks have become increasingly popular for the task of language modeling achieving impressive gains in state-of-the-art speech recognition and natural language processing (NLP) tasks. Recurrent models exploit word dependencies over a much longer context window (as retained by the history states) than what is feasible with n-gram language models.


The process of understanding acoustic properties of environments is important for several applications, such as spatial audio, augmented reality and source separation. In this paper, multichannel room impulse responses are recorded and transformed into their direction of arrival (DOA)-time domain, by employing a superdirective beamformer. This domain can be represented as a 2D image. Hence, a novel image processing method is proposed to analyze the DOA-time domain, and estimate the reflection times of arrival and DOAs. The main acoustically reflective objects are then localized.


Coral species, with complex morphology and ambiguous boundaries, pose a great challenge for automated classification. CNN activations, which are extracted from fully connected layers of deep networks (FC features), have been successfully used as powerful universal representations in many visual tasks. In this paper, we investigate the transferability and combined performance of FC features and CONV features (extracted


In optimization-based signal processing, the so-called prior term models the desired signal, and therefore its design is the key factor to achieve a good performance. For audio signals, the time-directional total variation applied to a spectrogram in combination with phase correction has been proposed recently to model sinusoidal components of the signal. Although it is a promising prior, its applicability might be restricted to some extent because of the mismatch of the assumption to the signal.


Random distortion testing (RDT) addresses the problem of testing whether or not a random signal deviates by more than a specified tolerance from a fixed value. The test is non-parametric in the sense that the distribution of the signal under each hypothesis is assumed to be unknown. The signal is observed in independent and identically distributed (i.i.d) additive noise. The need to control the probabilities of false alarm and missed de- tection while reducing the number of samples required to make a decision leads to the SeqRDT approach.


In the context of Cued Speech (CS) recognition, the recognition
of lips and hand movements is a key task. As we know, a good
temporal segmentation is necessary for the supervised recog-
nition system. However, lips and hand streams cannot share
the same temporal segmentation since they are not synchro-
nized. In this work, we propose a hand preceding model to
predict temporal segmentations of hand movements automati-
cally by exploring the relationship between hand preceding time

