ICASSP 2019

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

A Non-Convex Approach to Non-negative Super-Resolution: Theory and Algorithm

Read more about A Non-Convex Approach to Non-negative Super-Resolution: Theory and Algorithm
Log in to post comments

QIAO_HENG.pdf

QIAO_HENG.pdf (376)

Categories:: Signal Processing Theory and Methods

29 Views

Unsuper vised Deep Clustering for Source Separation: Direct Learning from Mixtures Using Spatial Information Slides

We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information. We use a deep clustering approach which trains on multi-channel mixtures and learns to project spectrogram bins to source clusters that correlate with various spatial features. We show that using such a training process we can obtain separation performance that is as good as making use of ground truth separation information.

unsup_icassp19_slides_new.pdf

unsup_dc_icassp19_slides (339)

Categories:: Source Separation and Signal Enhancement

74 Views

Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier

Read more about Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier
Log in to post comments

Confidences are integral to ASR systems, and applied to data selection, adaptation, ranking hypotheses, arbitration etc.Hybrid ASR system is inherently a match between pronunciations and AM+LM evidence but current confidence features lack pronunciation information. We develop pronunciation embeddings to represent and factorize acoustic score in relevant bases, and demonstrate 8-10% relative reduction in false alarm (FA) on large scale tasks. We generalize to standard NLP embeddings like Glove, and show 16% relative reduction in FA in combination with Glove.

WordEmbed_v5.pdf

WordEmbed_v5.pdf (475)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

37 Views

CODING TREE EARLY TERMINATION FOR FAST HEVC TRANSRATING BASED ON RANDOM FORESTS

Read more about CODING TREE EARLY TERMINATION FOR FAST HEVC TRANSRATING BASED ON RANDOM FORESTS
Log in to post comments

Video transrating has become an essential task in streaming service providers that need to transmit and deliver different versions of the same content for a multitude of users that operate under different network conditions. As the transrating operation is comprised of a decoding and an encoding step in sequence, a huge computational cost is required in such large-scale services, especially when considering the use of complex state-of-the-art codecs, such as the High Efficiency Video Coding (HEVC).

PosterICASSP_vF.pdf

PosterICASSP_vF.pdf (352)

Categories:: Image/Video Coding
Other applications of machine learning (MLR-APPL)

13 Views

Immersive Audio Coding for Virtual Reality Using a Metadata-Assisted Extension of the 3GPP EVS Codec

Virtual Reality (VR) audio scenes may be composed of a very large number of audio elements, including dynamic audio objects, fixed audio channels and scene-based audio elements such as Higher Order Ambisonics (HOA).

VRStream.pdf

VRStream.pdf (477)

Categories:: Audio Coding

101 Views

Introducing the Orthogonal Periodic Sequences for the Identification of Functional Link Polynomial Filters

The paper introduces a novel family of deterministic signals, the orthogonal periodic sequences (OPSs), for the identification of functional link polynomial (FLiP) filters. The novel sequences share many of the characteristics of the perfect periodic sequences (PPSs). As the PPSs, they allow the perfect identification of a FLiP filter on a finite time interval with the cross-correlation method. In contrast to the PPSs, OPSs can identify also non-orthogonal FLiP filters, as the Volterra filters.

poster2.pdf

poster2.pdf (352)

Categories:: Signal and System Modeling, Representation and Estimation

18 Views

Inter- and Intra- Patient ECG Heartbeat Classification For Arrhythmia Detection: a Sequence to Sequence Deep Learning Approach

Electrocardiogram (ECG) signal is a common and powerful tool to study heart function and diagnose several abnormal arrhythmias. While there have been remarkable improvements in cardiac arrhythmia classification methods, they still cannot offer acceptable performance in detecting different heart conditions, especially when dealing with imbalanced datasets. In this paper, we propose a solution to address this limitation of current classification approaches by developing an automatic heartbeat classification method using deep convolutional neural networks and sequence to sequence models.

poster_Patient-ECG-Heartbeat_ICASSP19-v1.pdf

poster_Patient-ECG-Heartbeat_ICASSP19-v1.pdf (457)

Categories:: Audio and Acoustic Signal Processing

38 Views

Bluetooth based Indoor Localization using Triplet Embeddings

Read more about Bluetooth based Indoor Localization using Triplet Embeddings
Log in to post comments

ICASSP_2019_poster.pdf

ICASSP_2019_poster.pdf (530)

Categories:: Audio and Acoustic Signal Processing

4 Views

Denoising Gravitational Waves with Enhanced Deep Recurrent Denoising Auto-Encoders

Read more about Denoising Gravitational Waves with Enhanced Deep Recurrent Denoising Auto-Encoders
Log in to post comments

ICASSP poster single.pdf

ICASSP poster single.pdf (449)

Categories:: Machine Learning for Signal Processing

9 Views

End-to-End Anchored Speech Recognition

Read more about End-to-End Anchored Speech Recognition
Log in to post comments

Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of device- directed speech in the presence of interfering background speech, i.e., background noise and interfering speech from another person or media device in proximity need to be ignored. We propose two end-to-end models to tackle this problem with information extracted from the “anchored segment”.

ICASSP19_Poster_AnchoredSpeechRecogWithAttention.pdf

ICASSP19_Poster_AnchoredSpeechRecogWithAttention.pdf (682)

Categories:: Spoken Language Processing

123 Views

Pages