ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.
- Read more about Sparse Directed Graph Learning for Head Movement Prediction in 360 Video Streaming
- Log in to post comments
High-definition 360 videos encoded in fine quality are typically too large in size to stream in its entirety over bandwidth (BW)-constrained networks. One popular remedy is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction that foretells a viewer's future FoVs is essential.
- Categories:
- Read more about Time Difference of Arrival Estimation from Frequency-sliding Generalized Cross-Correlations Using Convolutional Neural Networks
- Log in to post comments
The interest in deep learning methods for solving traditional signal processing tasks has been steadily growing in the last years.
- Categories:
- Read more about MULTI IMAGE DEPTH FROM DEFOCUS NETWORK WITH BOUNDARY CUE FOR DUAL APERTURE CAMERA
- Log in to post comments
In this paper, we estimate depth information using two defocused images from dual aperture camera. Recent advances in deep learning techniques have increased the accuracy of depth estimation. Besides, methods of using a defocused image in which an object is blurred according to a distance from a camera have been widely studied. We further improve the accuracy of the depth estimation by training the network using two images with different degrees of depth-of-field.
- Categories:
- Read more about EMET : EMBEDDINGS FROM MULTILINGUAL- ENCODER TRANSFORMER FOR FAKE NEWS DETECTION
- Log in to post comments
In the last few years, social media networks have changed human life experience and behavior as it has broken down communication barriers, allowing ordinary people to actively produce multimedia content on a massive scale. On this wise, the information dissemination in social media platforms becomes increasingly common. However, misinformation is propagated with the same facility and velocity as real news, though it can result in irreversible damage to an individual or society at large.
ICASSP.pdf
- Categories:
- Read more about Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition
- Log in to post comments
This paper presents a novel 3DoF+ system that allows to navigate, i.e., change position, in scene-based spatial audio content beyond the sweet spot of a Higher Order Ambisonics recording. It is one of the first such systems based on sound capturing at a single spatial position. The system uses a parametric decomposition of the recorded sound field. For the synthesis, only coarse distance information about the sources is needed as side information but not the exact number of them.
handout.pdf
- Categories:
- Read more about Curriculum learning for speech emotion recognition from crowdsourced labels
- Log in to post comments
This study introduces a method to design a curriculum for machine-learning to maximize the efficiency during the training process of deep neural networks (DNNs) for speech emotion recognition. Previous studies in other machine-learning problems have shown the benefits of training a classifier following a curriculum where samples are gradually presented in increasing level of difficulty. For speech emotion recognition, the challenge is to establish a natural order of difficulty in the training set to create the curriculum.
- Categories:
This paper describes new reference benchmark results based on hybrid Hidden Markov Model and Deep Neural Networks (HMM-DNN) for the GlobalPhone (GP) multilingual text and speech database. GP is a multilingual database of high-quality read speech with corresponding transcriptions and pronunciation dictionaries in more than 20 languages. Moreover, we provide new results for five additional languages, namely, Amharic, Oromo, Tigrigna, Wolaytta, and Uyghur.
- Categories:
- Read more about Semi-Supervised Optimal Transport Methods for Detecting Anomalies
- Log in to post comments
Building upon advances on optimal transport and anomaly detection, we propose a generalization of an unsupervised and automatic method for detection of significant deviation from reference signals. Unlike most existing approaches for anomaly detection, our method is built on a non-parametric framework exploiting the optimal transportation to estimate deviation from an observed distribution.
- Categories:
- Read more about Recovery of binary sparse signals from compressed linear measurements via polynomial optimization
- 1 comment
- Log in to post comments
- Categories:
- Read more about DEEP ENCODED LINGUISTIC AND ACOUSTIC CUES FOR ATTENTION BASED END TO END SPEECH EMOTION RECOGNITION
- Log in to post comments
- Categories: