ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Joint Speech Recognition and Audio Captioning

Read more about Joint Speech Recognition and Audio Captioning
Log in to post comments

Speech samples recorded in both indoor and outdoor environments are often contaminated with secondary audio sources. Most end-to-end monaural speech recognition systems either remove these background sounds using speech enhancement or train noise-robust models. For better model interpretability and holistic understanding, we aim to bring together the growing field of automated audio captioning (AAC) and the thoroughly studied automatic speech recognition (ASR). The goal of AAC is to generate natural language descriptions of contents in audio samples.

ICASSP 2022 Chai - Joint ASR AAC.pdf

ICASSP 2022 Chai - Joint ASR AAC.pdf (303)

Categories:: Robust Speech Recognition (SPE-ROBU)

13 Views

Improving Feature Generalizability With Multitask Learning In Class Incremental Learning

poster_icassp.pdf

poster_icassp.pdf (292)

Categories:: Applications in Music and Audio Processing (MLR-MUSI)

70 Views

Improving Feature Generalizability With Multitask Learning In Class Incremental Learning

2022-03-12-ICASSP.pdf

2022-03-12-ICASSP.pdf (306)

Categories:: Applications in Music and Audio Processing (MLR-MUSI)

9 Views

Model-Based Reconstruction for Collimated Beam Ultrasound Systems

Read more about Model-Based Reconstruction for Collimated Beam Ultrasound Systems
Log in to post comments

Collimated beam ultrasound systems are a novel technology for imaging inside multi-layered structures such as geothermal wells. Such systems include a transmitter and multiple receivers to capture reflected signals. Common algorithms for ultrasound reconstruction use delay-and-sum (DAS) approaches; these have low computational complexity but produce inaccurate images in the presence of complex structures and specialized geometries such as collimated beams.

ICASSP2022.pdf

ICASSP2022.pdf (243)

Categories:: Image/Video Processing

4 Views

FilterAugment: An Acoustic Environmental Data Augmentation Method

Read more about FilterAugment: An Acoustic Environmental Data Augmentation Method
Log in to post comments

Acoustic environments affect acoustic characteristics of sound to be recognized by physically interacting with sound wave propagation. Thus, training acoustic models for audio and speech tasks requires regularization on various acoustic environments in order to achieve robust performance in real life applications. We propose FilterAugment, a data augmen-tation method for regularization of acoustic models on vari-ous acoustic environments.

[poster presentation] FilterAugment.pdf

poster pdf (217)

Categories:: Neural network learning (MLR-NNLR)

20 Views

Direct Localization: An Ising Model Approach

Read more about Direct Localization: An Ising Model Approach
Log in to post comments

Accurate indoor localization is a challenging problem in a multipath environment. In order to tackle this problem, several methods have been proposed. Direct localization is one of these methods that makes use of a two-dimensional search in a planar geometry. In this paper, we use a compressed sensing framework in the direct localization technique to estimate the location of a user in an indoor multipath environment. We form a penalized `0-norm structure for this problem, and then convert this structure to an Ising energy problem.

ICASSP_2022_poster (1).pdf

ICASSP_2022_poster (1).pdf (298)

Categories:: Applications of Sensor Array and Multi-channel Signal Processing

14 Views

NEAREST SUBSPACE SEARCH IN THE SIGNED CUMULATIVE DISTRIBUTION TRANSFORM SPACE FOR 1D SIGNAL CLASSIFICATION

This paper presents a new method to classify 1D signals using the signed cumulative distribution transform (SCDT). The proposed method exploits certain linearization properties of

ICASSP_2022_slides.pdf

PowerPoint slides used to present the paper at ICASSP 2022 (254)

Categories:: Pattern recognition and classification (MLR-PATT)

10 Views

INTERPRETING INTERMEDIATE CONVOLUTIONAL LAYERS IN UNSUPERVISED ACOUSTIC WORD CLASSIFICATION

Understanding how deep convolutional neural networks classify data has been subject to extensive research. This paper proposes a technique to visualize and interpret intermediate layers of unsupervised deep convolutional networks by averaging over individual feature maps in each convolutional layer and inferring underlying distributions of words with non-linear regression techniques. A GAN-based architecture (ciwGAN [1]) that includes a Generator, a Discriminator, and a classifier was trained on unlabeled sliced lexical items from TIMIT.

ICASSP 2022.pdf

ICASSP 2022 presentation (238)

Categories:: Audio and Acoustic Signal Processing

14 Views

Identification of Overlapping Echoes of Unknown Shape from Time-Encoding Machine Samples.

We present an algorithm for the resolution of delayed and overlapping pulses of a common unknown shape from multi- channel measurements. We show that just a few Fourier sam- ples acquired by a Time Encoding Machine (TEM) suffice to solve this challenging problem. This acquisition scheme is desired for ultra-low power applications in wearables, such as EMG skin sensor tattoo.

icassp_slides.pdf

icassp_slides.pdf (288)

Categories:: Multi-channel Signal Processing

22 Views

Habibzadeh_ICASSP3270_poster

Read more about Habibzadeh_ICASSP3270_poster
Log in to post comments

We present metamer identification plus (metaID+), an algorithm that enhances the performance of brain-computer interface (BCI)-based color vision assessment. BCI-based color vision assessment uses steady-state visual evoked potentials (SSVEPs) elicited during a grid search of colors to identify metamers—light sources with different spectral distributions that appear to be the same color. Present BCI-based color vision assessment methods are slow; they require extensive data collection for each color in the grid search to reduce measurement noise.