Sorry, you need to enable JavaScript to visit this website.

Learning a joint and coordinated representation between different modalities can improve multimodal emotion recognition. In this paper, we propose a deep representation learning approach for emotion recognition from electroencephalogram (EEG) signals guided by facial electromyogram (EMG) and electrooculogram (EOG) signals. We recorded EEG, EMG and EOG signals from 60 participants who watched 40 short videos and self-reported their emotions.

Categories:
60 Views

Tensor decomposition has been proved to be effective for solving many problems in signal processing and machine learning. Recently, tensor decomposition finds its advantage for compressing deep neural networks. In many applications of deep neural networks, it is critical to reduce the number of parameters and computation workload to accelerate inference speed in deployment of the network. Modern deep neural network consists of multiple layers with multi-array weights where tensor decomposition is a natural way to perform compression.

Categories:
45 Views

We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes and performs classification using the first output neuron to spike. This temporal coding scheme allows the supervised training of the network with backpropagation, using locally exact derivatives of the postsynaptic with respect to presynaptic spike times. The network uses a biologically-inspired alpha synaptic transfer function and trainable synchronisation pulses as temporal references. We successfully train the network on the MNIST dataset encoded in time.

Categories:
23 Views

As automatic speaker recognizer systems become mainstream, voice spoofing attacks are on the rise. Common attack strategies include replay, the use of text-to-speech synthesis, and voice conversion systems. While previously-proposed end-to-end detection frameworks have shown to be effective in spotting attacks for one particular spoofing strategy, they have relied on different models, architectures, and speech representations, depending on the spoofing strategy.

Categories:
34 Views

We present a novel lipreading system that improves on the task of speaker-independent word recognition by decoupling motion and content dynamics. We achieve this by implementing a deep learning architecture that uses two distinct pipelines to process motion and content and subsequently merges them, implementing an end-to-end trainable system that performs fusion of independently learned representations. We obtain a average relative word accuracy improvement of ≈6.8% on unseen speakers and of ≈3.3% on known speakers, with respect to a baseline which uses a standard architecture.

Categories:
47 Views

Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint Variational Autoencoders with Bernoulli mixture models (VAB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by Variational Autoencoders (VAEs) and group data representations by Bernoulli mixture models (BMMs).

Categories:
27 Views

The recent years have witnessed the widespread of light field imaging in interactive and immersive visual applications. To record the directional information of the light rays, larger storage space is required by light field images compared with conventional 2D images. Hence, the efficient compression of light field image is highly desired for further applications. In this paper, we propose a novel light field image compression scheme using multi- branch spatial transformer networks based view synthesis.

Categories:
112 Views

In this work, a flow-guided temporal-spatial network (FGTSN) is proposed to enhance the quality of HEVC compressed video. Specially, we first employ a motion estimation subnet via trainable optical flow module to estimate the motion flow between current frame and its adjacent frames. Guiding by the predicted motion flow, the adjacent frames are aligned to current frame. Then, a temporal encoder is designed to discover the variations between current frame and its warped frames. Finally, the reconstruction frame is generated by training the model in a multi-supervised fashion.

Categories:
93 Views

Pages