Image, Video, and Multidimensional Signal Processing

ICASSP_HMNet.pptx

ICASSP_HMNet.pptx (177)

Categories:: Image, Video, and Multidimensional Signal Processing

38 Views

AUDIO-VISUAL ACTIVE SPEAKER EXTRACTION FOR SPARSELY OVERLAPPED MULTI-TALKER SPEECH

Read more about AUDIO-VISUAL ACTIVE SPEAKER EXTRACTION FOR SPARSELY OVERLAPPED MULTI-TALKER SPEECH
Log in to post comments

Target speaker extraction aims to extract the speech of a specific speaker from a multi-talker mixture as specified by an auxiliary reference. Most studies focus on the scenario where the target speech is highly overlapped with the interfering speech. However, this scenario only accounts for a small percentage of real-world conversations. In this paper, we aim at the sparsely overlapped scenarios in which the auxiliary reference needs to perform two tasks simultaneously: detect the activity of the target speaker and disentangle the active speech from any interfering speech.

Junjie-Li_%22AUDIO-VISUAL ACTIVE SPEAKER EXTRACTION FOR SPARSELY OVERLAPPED MULTI-TALKER SPEECH%22.pptx

Junjie-Li_%22AUDIO-VISUAL ACTIVE SPEAKER EXTRACTION FOR SPARSELY OVERLAPPED MULTI-TALKER SPEECH%22.pptx (212)

Categories:: Speech Enhancement (SPE-ENHA)
Image, Video, and Multidimensional Signal Processing

26 Views

MULTILINGUAL AUDIO-VISUAL SPEECH RECOGNITION WITH HYBRID CTC/RNN-T FAST CONFORMER

Read more about MULTILINGUAL AUDIO-VISUAL SPEECH RECOGNITION WITH HYBRID CTC/RNN-T FAST CONFORMER
Log in to post comments

Humans are adept at leveraging visual cues from lip movements for recognizing speech in adverse listening conditions. Audio-Visual Speech Recognition (AVSR) models follow similar approach to achieve robust speech recognition in noisy conditions. In this work, we present a multilingual AVSR model incorporating several enhancements to improve performance and audio noise robustness. Notably, we adapt the recently proposed Fast Conformer model to process both audio and visual modalities using a novel hybrid CTC/RNN-T architecture.

SLP-L25.3.pptx

SLP-L25.3.pptx (214)

Categories:: Image, Video, and Multidimensional Signal Processing

56 Views

HMNet: Hierarchical Microscale-aware Network for Infrared Small Target Detection

Read more about HMNet: Hierarchical Microscale-aware Network for Infrared Small Target Detection
Log in to post comments

Compared to the natural image community, infrared target detection suffers more challenges due to the severely tiny and low-contrast objects, especially in cases with obscuration from clutter and noise. The traditional solutions are susceptible to noise interference, which yields suboptimal performance lacking of contour and texture details. Meanwhile, due to the spatial invariance of convolutional layers, most deep learning-based methods locate small targets loosely during feature extraction, leading to serious omissions.

ICASSP_HMNet.pptx

ICASSP_HMNet.pptx (178)

Categories:: Image, Video, and Multidimensional Signal Processing

27 Views

Robust Lightweight Depth Estimation Model via Data-free Distillation

Read more about Robust Lightweight Depth Estimation Model via Data-free Distillation
Log in to post comments

Existing Monocular Depth Estimation (MDE) methods often use large and complex neural networks. Despite the advanced performance of these methods, we consider the efficiency and generalization for practical applications with limited resources. In our paper, we present an efficient transformer-based monocular relative depth estimation network and train it with a diverse depth dataset to obtain good generalization performance.

Robust_Lightweight_Depth_Estimation_Model_via_Data-Free_Distillation_poster.pdf

Robust_Lightweight_Depth_Estimation_Model_via_Data-Free_Distillation_poster_ICASSP2024 (241)

Categories:: Image, Video, and Multidimensional Signal Processing

34 Views

MTIDNET: A MULTIMODAL TEMPORAL INTEREST DETECTION NETWORK FOR VIDEO SUMMARIZATION

Read more about MTIDNET: A MULTIMODAL TEMPORAL INTEREST DETECTION NETWORK FOR VIDEO SUMMARIZATION
Log in to post comments

Video summarization involves creating a succinct overview by merging the valuable parts of a video. Existing video summarization
methods approach this task as a problem of selecting keyframes
by frame- and shot-level techniques with unimodal or bimodal information. Besides underestimated inter-relations between various
configurations of modality embedding spaces, current methods are
also limited in their ability to maintain the integrity of the semantics within the same summary segment. To address these issues,

icassp.pptx

icassp.pptx (174)

Categories:: Image, Video, and Multidimensional Signal Processing

28 Views

ATTENTIONLUT: ATTENTION FUSION-BASED CANONICAL POLYADIC LUT FOR REAL-TIME IMAGE ENHANCEMENT

Recently, many algorithms have employed image-adaptive lookup tables (LUTs) to achieve real-time image enhancement. Nonetheless, a prevailing trend among existing methods has been the employment of linear combinations of basic LUTs to formulate image-adaptive LUTs, which limits the generalization ability of these methods. To address this limitation, we propose a novel framework named AttentionLut for real-time image enhancement, which utilizes the attention mechanism to generate image-adaptive LUTs. Our proposed framework consists of three lightweight modules.

icassp_poster.pdf

Poster (279)

Categories:: Image, Video, and Multidimensional Signal Processing

20 Views

SELF-SUPERVISED MULTI-SCALE HIERARCHICAL REFINEMENT METHOD FOR JOINT LEARNING OF OPTICAL FLOW AND DEPTH

Recurrently refining the optical flow based on a single highresolution feature demonstrates high performance. We exploit the strength of this strategy to build a novel architecture for the joint learning of optical flow and depth. Our proposed architecture is improved to work in the case of training on unlabeled data, which is extremely challenging. The loss is computed for the iterations carried out over a single high-resolution feature, where the reconstruction loss fails to optimize the accuracy particularity in occluded regions.

SELF-SUPERVISED MULTI-SCALE HIERARCHICAL REFINEMENT METHOD FOR JOINT LEARNING OF OPTICAL FLOW AND DEPTH.pptx

SELF-SUPERVISED MULTI-SCALE HIERARCHICAL REFINEMENT METHOD FOR JOINT LEARNING OF OPTICAL FLOW AND DEPTH.pptx (177)

Self-Supervised_Multi-Scale_Hierarchical_Refinement_Method_for_Joint_Learning_of_Optical_Flow_and_Depth.pdf

Self-Supervised_Multi-Scale_Hierarchical_Refinement_Method_for_Joint_Learning_of_Optical_Flow_and_Depth.pdf (174)

Categories:: Image, Video, and Multidimensional Signal Processing

50 Views

LIGHTING IMAGE/VIDEO STYLE TRANSFER METHODS BY ITERATIVE CHANNEL PRUNING

Read more about LIGHTING IMAGE/VIDEO STYLE TRANSFER METHODS BY ITERATIVE CHANNEL PRUNING
Log in to post comments

Deploying style transfer methods on resource-constrained devices is challenging, which limits their real-world applicability. To tackle this issue, we propose using pruning techniques to accelerate various visual style transfer methods. We argue that typical pruning methods may not be well-suited for style transfer methods and present an iterative correlation-based channel pruning (ICCP) strategy for encoder-transform-decoder-based image/video style transfer models.

Presentation slides_IVMSP-L9_5_LIGHTING IMAGEVIDEO STYLE TRANSFER METHODS BY ITERATIVE CHANNEL.pptx

Presentation slides_IVMSP-L9_5_LIGHTING IMAGEVIDEO STYLE TRANSFER METHODS BY ITERATIVE CHANNEL.pptx (173)

Categories:: Image, Video, and Multidimensional Signal Processing

23 Views

LEARNING SPATIO-TEMPORAL RELATIONS WITH MULTI-SCALE INTEGRATED PERCEPTION FOR VIDEO ANOMALY DETECTION

In weakly supervised video anomaly detection, it has been verified that anomalies can be biased by background noise. Previous works attempted to focus on local regions to exclude irrelevant information. However, the abnormal events in different scenes vary in size, and current methods struggle to consider local events of different scales concurrently. To this end, we propose a multi-scale integrated perception

ye_poster.pdf

ye_poster.pdf (212)

Categories:: Image, Video, and Multidimensional Signal Processing

14 Views