ICASSP 2020

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Sparse Directed Graph Learning for Head Movement Prediction in 360 Video Streaming

Read more about Sparse Directed Graph Learning for Head Movement Prediction in 360 Video Streaming
Log in to post comments

High-definition 360 videos encoded in fine quality are typically too large in size to stream in its entirety over bandwidth (BW)-constrained networks. One popular remedy is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction that foretells a viewer's future FoVs is essential.

2069.pdf

2069.pdf (390)

Categories:: Image/Video Processing

50 Views

Time Difference of Arrival Estimation from Frequency-sliding Generalized Cross-Correlations Using Convolutional Neural Networks

The interest in deep learning methods for solving traditional signal processing tasks has been steadily growing in the last years.

Time Difference of Arrival Estimation from Frequency-sliding Generalized Cross-Correlations Using Convolutional Neural Networks.pdf

Presentation (424)

Categories:: Applications of Sensor Array and Multi-channel Signal Processing

62 Views

MULTI IMAGE DEPTH FROM DEFOCUS NETWORK WITH BOUNDARY CUE FOR DUAL APERTURE CAMERA

Read more about MULTI IMAGE DEPTH FROM DEFOCUS NETWORK WITH BOUNDARY CUE FOR DUAL APERTURE CAMERA
Log in to post comments

In this paper, we estimate depth information using two defocused images from dual aperture camera. Recent advances in deep learning techniques have increased the accuracy of depth estimation. Besides, methods of using a defocused image in which an object is blurred according to a distance from a camera have been widely studied. We further improve the accuracy of the depth estimation by training the network using two images with different degrees of depth-of-field.

ICASSP_MIDFD_PPT.pdf

ICASSP_MIDFD_PPT.pdf (380)

Categories:: Image, Video, and Multidimensional Signal Processing

31 Views

EMET : EMBEDDINGS FROM MULTILINGUAL- ENCODER TRANSFORMER FOR FAKE NEWS DETECTION

Read more about EMET : EMBEDDINGS FROM MULTILINGUAL- ENCODER TRANSFORMER FOR FAKE NEWS DETECTION
Log in to post comments

In the last few years, social media networks have changed human life experience and behavior as it has broken down communication barriers, allowing ordinary people to actively produce multimedia content on a massive scale. On this wise, the information dissemination in social media platforms becomes increasingly common. However, misinformation is propagated with the same facility and velocity as real news, though it can result in irreversible damage to an individual or society at large.

ICASSP.pdf

ICASSP.pdf (457)

Categories:: Multimedia Forensics

48 Views

Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition

This paper presents a novel 3DoF+ system that allows to navigate, i.e., change position, in scene-based spatial audio content beyond the sweet spot of a Higher Order Ambisonics recording. It is one of the first such systems based on sound capturing at a single spatial position. The system uses a parametric decomposition of the recorded sound field. For the synthesis, only coarse distance information about the sources is needed as side information but not the exact number of them.

handout.pdf

handout.pdf (831)

Categories:: Spatial and Multichannel Audio
Source Separation and Signal Enhancement
Audio for Multimedia
Loudspeaker and Microphone Array Signal Processing
Virtual reality and 3D imaging

86 Views

Curriculum learning for speech emotion recognition from crowdsourced labels

Read more about Curriculum learning for speech emotion recognition from crowdsourced labels
Log in to post comments

This study introduces a method to design a curriculum for machine-learning to maximize the efficiency during the training process of deep neural networks (DNNs) for speech emotion recognition. Previous studies in other machine-learning problems have shown the benefits of training a classifier following a curriculum where samples are gradually presented in increasing level of difficulty. For speech emotion recognition, the challenge is to establish a natural order of difficulty in the training set to create the curriculum.

CurrriculumLearning.pdf

Slides of paper (436)

Categories:: Speech Analysis (SPE-ANLS)

66 Views

DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES

Read more about DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES
Log in to post comments

This paper describes new reference benchmark results based on hybrid Hidden Markov Model and Deep Neural Networks (HMM-DNN) for the GlobalPhone (GP) multilingual text and speech database. GP is a multilingual database of high-quality read speech with corresponding transcriptions and pronunciation dictionaries in more than 20 languages. Moreover, we provide new results for five additional languages, namely, Amharic, Oromo, Tigrigna, Wolaytta, and Uyghur.

ICASSP2020_DNN4GlobalPhone_Paper5018_modified.pdf

ICASSP2020_DNN4GlobalPhone_Paper5018_modified.pdf (598)

Categories:: Human Spoken Language Acquisition, Development and Learning (SLP-LADL)

49 Views

Semi-Supervised Optimal Transport Methods for Detecting Anomalies

Read more about Semi-Supervised Optimal Transport Methods for Detecting Anomalies
Log in to post comments

Building upon advances on optimal transport and anomaly detection, we propose a generalization of an unsupervised and automatic method for detection of significant deviation from reference signals. Unlike most existing approaches for anomaly detection, our method is built on a non-parametric framework exploiting the optimal transportation to estimate deviation from an observed distribution.