Multimedia Signal Processing

Learning-based Point Cloud Decoding With Independent and Scalable Reduced Complexity

Read more about Learning-based Point Cloud Decoding With Independent and Scalable Reduced Complexity
1 comment
Log in to post comments

Point Clouds (PCs) have gained significant attention due to their usage in diverse application domains, notably virtual and augmented reality. While PCs excel in providing detailed 3D visualization, this typically requires millions of points which must be efficiently coded for real-world deployment, notably storage and streaming. Recently, learning-based coding solutions have been adopted, notably in the JPEG Pleno Point Coding (PCC) standard, which uses a coding model with millions of model parameters.

ICIP2024_Presentation_VFinal.pdf

ICIP2024_Presentation_VFinal.pdf (204)

Categories:: Image/Video Coding
Multimedia Signal Processing

37 Views

Non-separable Wavelet Transform Using Learnable Convolutional Lifting Steps

Read more about Non-separable Wavelet Transform Using Learnable Convolutional Lifting Steps
Log in to post comments

Wavelet transforms have been a relevant topic in signal processing for many years. One of the most common strategies when designing wavelet transforms is the use of lifting schemes, known for their perfect reconstruction properties and flexible design. This paper introduces a novel 2D non-separable lifting design methodology based on deep learning architectures. The proposed method is assessed within the context of end-to-end lossless image compression.

ICIP 24 Presentation - Non-separable Wavelet Transform Using Learnable Convolutional Lifting Steps.pdf

ICIP 24 Presentation - Non-separable Wavelet Transform Using Learnable Convolutional Lifting Steps.pdf (180)

Categories:: Multimedia Signal Processing

26 Views

A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval - Poster

Video databases from the internet are a valuable source of text-audio retrieval datasets. However, given that sound and vision streams represent different "views" of the data, treating visual descriptions as audio descriptions is far from optimal. Even if audio class labels are present, they commonly are not very detailed, making them unsuited for text-audio retrieval. To exploit relevant audio information from video-text datasets, we introduce a methodology for generating audio-centric descriptions using Large Language Models (LLMs).

icassp_2024_from_template.pdf

icassp_2024_from_template.pdf (246)

Categories:: Multimedia Signal Processing

29 Views

ColorFlow_ICASSP2024

Image colorization is an ill-posed task, as objects within grayscale images can correspond to multiple colors, motivating researchers to establish a one-to-many relationship between objects and colors. Previous work mostly could only create an insufficient deterministic relationship. Normalizing flow can fully capture the color diversity from natural image manifold. However, classical flow often overlooks the color correlations between different objects, resulting in generating unrealistic color.

ColorFlow_ICASSP2024.pptx

ColorFlow_ICASSP2024.pptx (232)

Categories:: Multimedia Signal Processing

40 Views

TALKNCE: IMPROVING ACTIVE SPEAKER DETECTION WITH TALK-AWARE CONTRASTIVE LEARNING

Read more about TALKNCE: IMPROVING ACTIVE SPEAKER DETECTION WITH TALK-AWARE CONTRASTIVE LEARNING
Log in to post comments

The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames.
Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored.
In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full segments where a person on the screen is actually speaking.

2024_ICASSP_TalkNCE.pptx

2024_ICASSP_TalkNCE.pptx (233)

Categories:: Multimedia Signal Processing

14 Views

A SELF-SUPERVISED LEARNING APPROACH FOR DETECTING NON-PSYCHOTIC RELAPSES USING WEARABLE-BASED DIGITAL PHENOTYPING

We present MagCIL's approach for the 1st track of the "2nd e-Prevention challenge: Psychotic and Non-Psychotic Relapse Detection using Wearable-Based Digital Phenotyping". First we present our approach for preprocessing and extracting features from the wearable's raw data. We then propose a Transformer model for learning self-supervised representations from augmented features, trained on data from non-relapse days from each of the 9 patients of the challenge. We adopt two unsupervised methods for detecting relapse days as outliers.

GC-L2.5_DEMOKRITOS_SOFIA_ELEFTHERIOU_Presentation.pptx

GC-L2.5_DEMOKRITOS_SOFIA_ELEFTHERIOU_Presentation.pptx (283)

Categories:: Multimedia Signal Processing

38 Views

MLSP-L13.4: Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup.

ICASSP_L13_4.pptx

ICASSP_L13_4.pptx (228)

Categories:: Multimedia Signal Processing

19 Views

Multi-Level Graph Learning For Audio Event Classification And Human-Perceived Annoyance Rating Prediction

WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a soundscape, and explore relations between diverse AEs and human-perceived annoyance rating (AR).

4320_hou_poster.pdf

4320_hou_poster.pdf (226)

Categories:: Content-Based Audio Processing
Multimedia Signal Processing

19 Views

ECM-OPCC: Efficient Context Model for Octree-based Point Cloud Compression

Read more about ECM-OPCC: Efficient Context Model for Octree-based Point Cloud Compression
Log in to post comments

Recently, deep learning methods have shown promising results in point cloud compression. However, previous octree-based approaches either lack sufficient context or have high decoding complexity (e.g. > 900s). To address this problem, we propose a sufficient yet efficient context model and design an efficient deep learning codec for point clouds. Specifically, we first propose a segment-constrained multi-group coding strategy to exploit the autoregressive context while maintaining decoding efficiency.

poster ECM-OPCC.pdf

poster ECM-OPCC.pdf (229)

Categories:: Multimedia Signal Processing

21 Views

GENEFORMER: LEARNED GENE COMPRESSION USING TRANSFORMER-BASED CONTEXT MODELING

Read more about GENEFORMER: LEARNED GENE COMPRESSION USING TRANSFORMER-BASED CONTEXT MODELING
Log in to post comments

The development of gene sequencing technology sparks an explosive growth of gene data. Thus, the storage of gene data has become an important issue. Recently, researchers begin to investigate deep learning-based gene data compression, which outperforms general traditional methods. In this paper, we propose a transformer-based gene compression method named GeneFormer. Specifically, we first introduce a modified transformer encoder with latent array to eliminate the dependency of the nucleotide sequence.

gene_poster.pdf

gene_poster.pdf (236)

Categories:: Multimedia Signal Processing

38 Views

Multimedia Signal Processing

Pages