Sorry, you need to enable JavaScript to visit this website.

Automatic emotion recognition from speech is a challenging task which relies heavily on the effectiveness of the speech features used for classification. In this work, we study the use of deep learning to automatically discover emotionally relevant features from speech. It is shown that using a deep recurrent neural network, we can learn both the short-time frame-level acoustic features that are emotionally relevant, as well as an appropriate temporal aggregation of those features into a compact utterance-level representation.

Categories:
186 Views

Recurrent neural network (RNN) based character-level language models (CLMs) are extremely useful for modeling out-of-vocabulary words by nature. However, their performance is generally much worse than the word-level language models (WLMs), since CLMs need to consider longer history of tokens to properly predict the next one. We address this problem by proposing hierarchical RNN architectures, which consist of multiple modules with different timescales.

Categories:
18 Views

In this paper, we estimate perceived image quality using sparse representations obtained from generic image databases through an unsupervised learning approach. A color space transformation, a mean subtraction, and a whitening operation are used to enhance descriptiveness of images by reducing spatial redundancy; a linear decoder is used to obtain sparse representations; and a thresholding stage is used to formulate suppression mechanisms in a visual system.

Categories:
5 Views

Automatic inspection of underwater pipelines has been a task of growing importance for the detection of a variety of events, which include inner coating exposure and presence of algae. Such inspections might benefit of machine learning techniques in order to accurately classify such occurrences. This article describes a deep convolutional neural network algorithm for the classification of underwater pipeline events. The neural network architecture and parameters that result in optimal classifier performance are selected.

Categories:
33 Views

Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties behind the remarkable performance remain unclear in many applications, e.g., automatic speech recognition (ASR). This paper employs visualization techniques to study the behavior of LSTM and GRU when performing speech recognition tasks.

Categories:
11 Views

Deep learning has gained great popularity due to its widespread success on many inference problems. We consider the application of deep learning to the sparse linear inverse problem encountered in compressive sensing, where one seeks to recover a sparse signal from a few noisy linear measurements. In this paper, we propose two novel neural-network architectures that decouple prediction errors across layers in the same way that the approximate message passing (AMP) algorithms decouple them across iterations: through Onsager correction.

Categories:
74 Views

To date, automatic target recognition (ATR) techniques in synthetic aperture radar (SAR) imagery have largely focused on features that use only the magnitude part of SAR’s complex valued magnitude-plus-phase history. While such techniques are often very successful, they inherently ignore the significant amount of discriminatory information available in the phase. This paper describes a method for exploiting the complex information for ATR by using a convolutional neural network (CNN) that accepts fully complex input features.

Categories:
9 Views

Model based VAD approaches have been widely used and
achieved success in practice. These approaches usually cast
VAD as a frame-level classification problem and employ statistical
classifiers, such as Gaussian Mixture Model (GMM) or
Deep Neural Network (DNN) to assign a speech/silence label
for each frame. Due to the frame independent assumption classification,
the VAD results tend to be fragile. To address this
problem, in this paper, a new structured multi-frame prediction
DNN approach is proposed to improve the segment-level

Categories:
32 Views

Pages