ICASSP 2017

ICASSP is the world's largest and most comprehensive technical conference on signal processing and its applications. It provides a fantastic networking opportunity for like-minded professionals from around the world. ICASSP 2017 conference will feature world-class presentations by internationally renowned speakers and cutting-edge session topics. Visit ICASSP 2017

Discovering Sound Concepts and Acoustic Relations in Text

Read more about Discovering Sound Concepts and Acoustic Relations in Text
Log in to post comments

In this paper we describe approaches for discovering acoustic concepts and relations in text. The first major goal is to be able to identify text phrases which contain a notion of audibility and can be termed as a sound or an acoustic concept. We also propose a method to define an acoustic scene through a set of sound concepts. We use pattern matching and parts of speech tags to generate sound concepts from large scale text corpora. We use dependency parsing

anurag_icassp17.pdf

anurag_icassp17.pdf (680)

Categories:: Audio and Acoustic Signal Processing

13 Views

Exemplar‐Embed Complex Matrix Factorization for Facial Expression Recognition

Read more about Exemplar‐Embed Complex Matrix Factorization for Facial Expression Recognition
Log in to post comments

ICASSP2017_Facial.pdf

ICASSP2017_Facial.pdf (569)

Categories:: Image/Video Processing

6 Views

TCLBP: An LBP-based Color Descriptor for Face Recognition

Read more about TCLBP: An LBP-based Color Descriptor for Face Recognition
Log in to post comments

poster.pdf

poster.pdf (1207)

Categories:: Image/Video Processing

43 Views

Symbol Detection for Faster-than-Nyquist Signaling by Sum-of-Absolute-Values Optimization

In this work, we propose a new symbol detection method in faster-than-Nyquist signaling for effective data transmission. Based on the frame theory, the symbol detection problem is described as under-determined linear equations on a finite alphabet. While the problem is itself NP (non-deterministic polynomial-time) hard, we propose convex relaxation using the sum-of-absolute-values optimization, which can be efficiently solved by proximal splitting. Simulation results are shown to illustrate the effectiveness of the proposed method compared to a recent ell-infinity-based method.

ICASSP_Poster2.pptx

The poster for our presentation in ICASSP 2017 (316)

Categories:: Communication Systems and Applications

30 Views

A First Attempt at Polyphonic Sound Event Detection Using Connectionist Temporal Classification

Sound event detection is the task of detecting the type, starting time, and ending time of sound events in audio streams. Recently, recurrent neural networks (RNNs) have become the mainstream solution for sound event detection. Because RNNs make a prediction at every frame, it is necessary to provide exact starting and ending times of the sound events in the training data, making data annotation an extremely time-consuming process.

2017.03 Poster for ICASSP.pdf

2017.03 Poster for ICASSP.pdf (58)

Categories:: Audio for Multimedia
Multimodal signal processing

8 Views

INTERAURAL TIME DELAY PERSONALISATION USING INCOMPLETE HEAD SCANS

Read more about INTERAURAL TIME DELAY PERSONALISATION USING INCOMPLETE HEAD SCANS
Log in to post comments

When using a set of generic head-related transfer functions (HRTFs) for spatial sound rendering, personalisation can be considered to minimise localisation errors. This typically involves tuning the characteristics of the HRTFs or a parametric model according to the listener’s anthropometry. However, measuring anthropometric features directly remains a challenge in practical applications, and the mapping between anthropometric and acoustic features is an open research problem.

ITD_personalisation_based_on_partial_scans.pdf

Poster: ITD personalisation using incomplete head scans (838)

Categories:: Spatial and Multichannel Audio

11 Views

BAYESIAN RECONSTRUCTION OF HYPERSPECTRAL IMAGES BY USING COMPRESSED SENSING MEASUREMENTS AND A LOCAL STRUCTURED PRIOR

This paper introduces a hierarchical Bayesian model for the reconstruction of hyperspectral images using compressed sensing measurements. This model exploits known properties of natural images, promoting the recovered image to be sparse on a selected basis and smooth in the image domain. The posterior distribution of this model is too complex to derive closed form expressions for the estimators of its parameters. Therefore, an MCMC method is investigated to sample this posterior distribution.

Icassp_Meija_Tourneret_costa_Batatia_Arguello.pdf

Icassp_Meija_Tourneret_costa_Batatia_Arguello.pdf (333)

Categories:: Sensor Array and Multichannel Signal Processing

15 Views

VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO

Read more about VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO
Log in to post comments

Speechreading is a notoriously difficult task for humans to perform. In this paper we present an end-to-end model based on a convolutional neural network (CNN) for generating an intelligible acoustic speech signal from silent video frames of a speaking person. The proposed CNN generates sound features for each frame based on its neighboring frames. Waveforms are then synthesized from the learned speech features to produce intelligible speech.

icassp17-poster.pdf

vid2speech_poster (276)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

23 Views

DETECTION OF IMPULSIVE DISTURBANCES IN ARCHIVE AUDIO SIGNALS

Read more about DETECTION OF IMPULSIVE DISTURBANCES IN ARCHIVE AUDIO SIGNALS
Log in to post comments

The problem of detection of impulsive disturbances in archive audio signals is considered. It is shown that semi-causal/noncausal solutions based on joint evaluation of signal prediction errors and leave-one-out signal interpolation errors, allow one to noticeably improve detection results compared to the prediction-only based solutions. The proposed approaches are evaluated on a set of clean audio signals contaminated with real click waveforms extracted from silent parts of old gramophone recordings.

ICASSP2017 final.pdf

Poster ICASSP 2017 (77)

Categories:: Music Signal Processing

7 Views

Statistical Normalisation of Phase-based Feature Representation for Robust Speech Recognition

In earlier work we have proposed a source-filter decomposition of
speech through phase-based processing. The decomposition leads
to novel speech features that are extracted from the filter component
of the phase spectrum. This paper analyses this spectrum and the
proposed representation by evaluating statistical properties at vari-
ous points along the parametrisation pipeline. We show that speech
phase spectrum has a bell-shaped distribution which is in contrast to
the uniform assumption that is usually made. It is demonstrated that

ICASSP2017_0.pdf

ICASSP2017_0.pdf (546)

Categories:: Robust Speech Recognition (SPE-ROBU)
Robust Speech Recognition (SPE-ROBU)

8 Views

Pages