ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Camera Calibration through Camera Projection Loss

Read more about Camera Calibration through Camera Projection Loss
Log in to post comments

Camera calibration is a necessity in various tasks including 3D reconstruction, hand-eye coordination for a robotic interaction, autonomous driving, etc. In this work we propose a novel method to predict extrinsic (baseline, pitch, and translation), intrinsic (focal length and principal point offset) parameters using an image pair. Unlike existing methods, instead of designing an end-to-end solution, we proposed a new representation that incorporates camera model equations as a neural network in a multi-task learning framework.

ICASSP_Camera_Calibration_Poster_A0L_v2.pdf

Poster (197)

ICASSP_Camera_Calibration_V8.pptx.pdf

Presentation (194)

Categories:: Image/Video Processing

19 Views

Ray-Space-Based Multichannel Nonnegative Matrix Factorization for Audio Source Separation

Nonnegative matrix factorization (NMF) has been traditionally considered a promising approach for audio source separation. While standard NMF is only suited for single-channel mixtures, extensions to consider multi-channel data have been also proposed. Among the most popular alternatives, multichannel NMF (MNMF) and further derivations based on constrained spatial covariance models have been successfully employed to separate multi-microphone convolutive mixtures.

ICASSP_2022_rsmnmf.pdf

ICASSP_2022_rsmnmf.pdf (198)

Categories:: Source Separation and Signal Enhancement

10 Views

Slides

Read more about Slides
Log in to post comments

Slides_5290.zip

Slides_5290.zip (201)

Categories:: Image/Video Coding

7 Views

SPARSITY-BASED SOUND FIELD SEPARATION IN THE SPHERICAL HARMONICS DOMAIN

Read more about SPARSITY-BASED SOUND FIELD SEPARATION IN THE SPHERICAL HARMONICS DOMAIN
Log in to post comments

Sound field analysis and reconstruction has been a topic of intense research in the last decades for its multiple applications in spatial audio processing tasks. In this context, the identification of the direct and reverberant sound field components is a problem of great interest, where several solutions exploiting spherical harmonics representations have already been proposed.

ICASSP_2022_sparse.pdf

sparse-spherical-harmonics-soundfield (220)

Categories:: Spatial and Multichannel Audio

4 Views

Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data

Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks. However, due to its intrinsic nature, AUC optimisation has focused only on binary tasks so far. In this paper, we introduce an extension to the AUC optimisation framework so that it can be easily applied to an arbitrary number of classes, aiming to overcome the issues derived from training data limitations in deep learning solutions.

poster_icassp22_final.pdf

poster_icassp22_final.pdf (429)

Categories:: Content-Based Audio Processing

8 Views

Semidefinite Relaxation Method for Moving Object Localization Using A Stationary Transmitter at Unknown Position

ICASSP.zip

ICASSP.zip (204)

Categories:: Sensor Array Processing

9 Views

Multi-View Self-Attention based Transformer for Speaker Recognition

Read more about Multi-View Self-Attention based Transformer for Speaker Recognition
Log in to post comments

Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional self-attention mechanisms are originally designed for modeling textual sequence without considering the characteristics of speech and speaker modeling. Besides, different Transformer variants for speaker recognition have not been well studied.

Speaker Transformer.pdf

Our presentation slide of ICASSP 2022 paper SPE-25.4 (306)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

18 Views

JOINT BEAM SELECTION AND PRECODING BASED ON DIFFERENTIAL EVOLUTION FOR MILLIMETER-WAVE MASSIVE MIMO SYSTEMS

Poster3.30.pdf

Poster3.30.pdf (374)

Categories:: MIMO Communications and Signal Processing

15 Views

ORCA-PARTY: An Automtatic Killer Whale Sound Type Separation Toolkit Using Deep Learning

Data-driven and machine-based analysis of massive bioacoustic data collections, in particular acoustic regions containing a substantial number of vocalizations events, is essential and extremely valuable to identify recurring vocal paradigms. However, these acoustic sections are usually characterized by a strong incidence of overlapping vocalization events, a major problem severely affecting subsequent human-/machine-based analysis and interpretation.

2605_BerglerChristian_Poster.pdf

Poster ORCA-PARTY ICASSP 2022 (247)

2605_BerglerChristian_Slides.pdf

Presentation Slides ORCA-PARTY ICASS 2022 (263)

Categories:: Bioacoustics and Medical Acoustics

24 Views

Near-field Tracking with Large Antenna Arrays: Fundamental Limits and Practical Algorithms

Applications towards 6G have brought a huge interest towards arrays with a high number of antennas and operating within the millimeter and sub-THz bandwidths for joint communication, sensing, and localization.
With such large arrays, the plane wave approximation is often not accurate because the system may operate in the (radiating) near-field propagation region (i.e., the Fresnel region) where the electromagnetic field wavefront is spherical.

ICASSP_2022.pdf

ICASSP_2022.pdf (179)

Categories:: Signal and System Modeling, Representation and Estimation

12 Views

Pages