ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

AN INVESTIGATION OF THE EFFECTIVENESS OF PHASE FOR AUDIO CLASSIFICATION

Read more about AN INVESTIGATION OF THE EFFECTIVENESS OF PHASE FOR AUDIO CLASSIFICATION
Log in to post comments

While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for tasks such as speech enhancement and source separation. In this study, we extensively investigated the effectiveness of including phase information of signals for eight audio classification tasks. We constructed a learnable front-end that can compute the phase and its derivatives based on a time-frequency representation with mel-like frequency axis.

Slides_ICASSP2022_MLSP-21.5.pdf

Slides_ICASSP2022_MLSP-21.5.pdf (279)

Poster_ICASSP2022_MLSP-21.5.pdf

Poster_ICASSP2022_MLSP-21.5.pdf (211)

Categories:: Pattern recognition and classification (MLR-PATT)

37 Views

ADT: ANTI-DEEPFAKE TRANSFORMER

Read more about ADT: ANTI-DEEPFAKE TRANSFORMER
Log in to post comments

Recently almost all the mainstream deepfake detection methods use Convolutional Neural Networks (CNN) as their backbone. However, due to the overreliance on local texture information, which is usually determined by forgery methods of training data, these CNN based methods cannot generalize well to unseen data. To get out of the predicament of prior methods, in this paper, we propose a novel transformer-based framework to model both global and local information and analyze anomalies of face images.

adt_poster_fat.pdf

adt_poster_fat.pdf (298)

Categories:: Multimedia Forensics

22 Views

Region-to-region kernel interpolation of acoustic transfer function with directional weighting

A method of interpolating the acoustic transfer function (ATF) between regions that takes into account both the physical properties of the ATF and the directionality of region configurations is proposed. Most spatial ATF interpolation methods are limited to estimation in the region of receivers. A kernel method for region-to-region ATF interpolation makes it possible to estimate the ATFs for both source and receiver regions from a discrete set of ATF measurements.

Poster_presentation.pdf

Poster_presentation.pdf (389)

ICASSP_2022_slides.pdf

ICASSP_2022_slides.pdf (372)

Categories:: Room Acoustics and Acoustic System Modeling
Applications of Sensor Array and Multi-channel Signal Processing
Spatial and Multichannel Audio

81 Views

Video Anomaly Detection via Prediction Network with Enhanced Spatio-temporal Memory Exchange

Video anomaly detection is a challenging task because most anomalies are scarce and non-deterministic. Many approaches investigate the reconstruction difference between normal and abnormal patterns, but neglect that anomalies do not necessarily correspond to large reconstruction errors. To address this issue, we design a Convolutional LSTM Auto-Encoder prediction framework with enhanced spatio-temporal memory exchange using bi-directionalilty and a higher-order mechanism. The bi-directional structure promotes learning the temporal regularity through forward and backward predictions.

ICASSP_poster.pdf

ICASSP_poster.pdf (377)

Categories:: Machine Learning for Signal Processing
Image/Video Processing

40 Views

ALSNET: A DILATED 1-D CNN FOR IDENTIFYING ALS FROM RAW EMG SIGNAL (Presentation Slides)

Amyotrophic Lateral Sclerosis (ALS) is one of the most common neuromuscular diseases which affects both lower and upper motor neurons. In this paper, a dilated one dimensional convolutional neural network, named ALSNet, is proposed for identifying ALS from raw EMG signal. No hand-crafted feature extraction is required, rather, ALSNet is able to take raw EMG signal as input and detect EMG signals of ALS subjects. This makes the method more feasible for practical implementation by reducing the computational cost required for extracting features.

8762_ICASSP22.pptx

8762_ICASSP22.pptx (368)

Categories:: Biomedical signal processing
Other applications of machine learning (MLR-APPL)

523 Views

LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION

Violence detection is an essential and challenging problem in the computer vision community. Most existing works focus on single modal data analysis, which is not effective when multi-modality is available.

ICASSP2022_Poster .pdf

ICASSP2022_Poster .pdf (208)

Categories:: Image/Video Processing

22 Views

ALSNET: A DILATED 1-D CNN FOR IDENTIFYING ALS FROM RAW EMG SIGNAL (Poster)

Read more about ALSNET: A DILATED 1-D CNN FOR IDENTIFYING ALS FROM RAW EMG SIGNAL (Poster)
Log in to post comments

8762_ICASSP22_Poster.pdf

8762_ICASSP22_Poster.pdf (262)

Categories:: Biomedical signal processing
Other applications of machine learning (MLR-APPL)

21 Views

Augmenting Molecular Deep Generative Models with Topological Data Analysis Representations

generative_tda_icassp_poster_2022_v2.pdf

generative_tda_icassp_poster_2022_v2.pdf (248)

Categories:: Neural network learning (MLR-NNLR)

8 Views

Digraph Signal Processing with Generalized Boundary Conditions

Read more about Digraph Signal Processing with Generalized Boundary Conditions
Log in to post comments

Signal processing on directed graphs (digraphs) is problematic, since the graph shift, and thus associated filters, are in general not diagonalizable. Furthermore, the Fourier transform in this case is now obtained from the Jordan decomposition, which may not be computable at all for large graphs. We propose a novel and general solution for this problem based on matrix perturbation theory: We design an algorithm that adds a small number of edges to a given digraph to destroy nontrivial Jordan blocks.

presentation.pptx

Slides for Digraph Signal Processing with Generalized Boundary Conditions (214)

poster.pdf

Poster for Digraph Signal Processing with Generalized Boundary Conditions (305)

Categories:: Other

47 Views

PSEUDO-INTERACTING GUIDED NETWORK FOR FEW-SHOT SEGMENTATION

Read more about PSEUDO-INTERACTING GUIDED NETWORK FOR FEW-SHOT SEGMENTATION
Log in to post comments

Few-shot segmentation has got a lot of concerns recently. Existing methods mainly locate and recognize the target object based on a cross-guided way that applies masked target object features of sup- port(query) images to make a feature matching with query(support) images. However, there are some differences between support images and query images because of large appearance and scale variation, which will lead to inaccurate and incomplete segmentation. This problem inspired us to explore the local coherence of the image to guide the segmentation.

icassp.pptx

icassp.pptx (212)

Categories:: Image/Video Processing

20 Views

Pages