Deep Learning | SigPort

APPENDIX: MULTIMAE MEETS EARTH OBSERVATION: PRE-TRAINING MULTI-MODAL MULTI-TASK MASKED AUTOENCODERS FOR EARTH OBSERVATION TASKS

Read more about APPENDIX: MULTIMAE MEETS EARTH OBSERVATION: PRE-TRAINING MULTI-MODAL MULTI-TASK MASKED AUTOENCODERS FOR EARTH OBSERVATION TASKS
Log in to post comments

APPENDIX: MULTIMAE MEETS EARTH OBSERVATION: PRE-TRAINING MULTI-MODAL MULTI-TASK MASKED AUTOENCODERS FOR EARTH OBSERVATION TASKS

appendix_icip_2025_1750.pdf

appendix_icip_2025_1750.pdf (112)

Categories:: Applications in Data Fusion (MLR-FUSI)

49 Views

Accurate colon segmentation using 2D convolutional neural networks with 3D contextual information

This study introduces an innovative framework for accurate colon segmentation in abdomen CT scans, addressing the unique challenges of this task. Our architecture enhances well-established 2D segmentation models by incorporating 3D contextual information through a novel method that generates an attention map for a given slice by considering its neighboring slices. This approach achieves effective colon segmentation without complex 3D convolutional neural networks (CNNs) or Long Short-Term Memory (LSTM) networks by combining 2D CNNs.

ICIP_Poster_Samir_2024.pdf

Poster for paper (162)

Categories:: Medical imaging
Machine Learning for Signal Processing

16 Views

BLEND-RES^2NET: Blended Representation Space by Transformation of Residual Mapping with Restrained Learning For Time Series Classification

The typical problem like insufficient training instances in time series classification task demands for novel deep neural architecture to warrant consistent and accurate performance. Deep Residual Network (ResNet) learns through H(x)=F(x)+x, where F(x) is a nonlinear function. We propose Blend-Res2Net that blends two different representation spaces: H^1 (x)=F(x)+Trans(x) and H^2 (x)=F(Trans(x))+x with the intention of learning over richer representation by capturing the temporal as well as the spectral signatures (Trans(∙) represents the transformation function).

ICASSP 2021_presentation_Arijit_Ukil.pdf

https://ieeexplore.ieee.org/document/9414647 (253)

Categories:: Neural network learning (MLR-NNLR)

18 Views

Deep Residual Echo Suppression with a Tunable Tradeoff Between Signal Distortion and Echo Suppression

In this paper, we propose a residual echo suppression method using a UNet neural network that directly maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain. This system embeds a design parameter that allows a tunable tradeoff between the desired-signal distortion and residual echo suppression in double-talk scenarios. The system employs 136 thousand parameters, and requires 1.6 Giga floating-point operations per second and 10 Mega-bytes of memory.

Amir Ivry - Poster.pdf

Amir Ivry - Poster.pdf (313)

Amir Ivry - Slides.pdf

Amir Ivry - Slides.pdf (313)

Categories:: Echo Cancellation

19 Views

Multi-Patch Aggregation Models for Resampling Detection

Read more about Multi-Patch Aggregation Models for Resampling Detection
1 comment
Log in to post comments

Images captured nowadays are of varying dimensions with smartphones and DSLR’s allowing users to choose from a list of available image resolutions. It is therefore imperative for forensic algorithms such as resampling detection to scale well for images of varying dimensions. However, in our experiments we observed that many state-of-the-art forensic algorithms are sensitive to image size and their performance quickly degenerates when operated on images of diverse dimensions despite re-training them using multiple image sizes.

Multi-Patch Aggregation Models For Resampling Detection.pdf

Presentation Slides for the project (383)

Categories:: Multimedia Forensics

34 Views

NEURAL ADAPTIVE IMAGE DENOISER

Read more about NEURAL ADAPTIVE IMAGE DENOISER
Log in to post comments

We propose a novel neural network-based adaptive image denoiser, dubbased as Neural AIDE. Unlike other neural network-based denoisers, which typically apply supervised training to learn a mapping from a noisy patch to a clean patch, we formulate to train a neural network to learn context- based affine mappings that get applied to each noisy pixel. Our formulation enables using SURE (Stein’s Unbiased Risk Estimator)-like estimated losses of those mappings as empirical risks to minimize.

ICASSP_2018_Poster_final.pdf

NAIDE_Poster_ICASSP2018 (1072)

Categories:: Audio and Acoustic Signal Processing

148 Views

Fast Vehicle Detection with Lateral Convolutional Neural Network

Read more about Fast Vehicle Detection with Lateral Convolutional Neural Network
Log in to post comments

Fast Vehicle Detection with Lateral Convolutional Neural Network

Lateral-CNN Slides.pptx

Lateral-CNN (407)

Lateral-CNN Slides.pptx

Lateral-CNN Slides.pptx (374)

Categories:: Image/Video Processing

29 Views

COMPARISON OF OBJECTIVE FUNCTIONS IN CNN-BASED PROSTATE MAGNETIC RESONANCE IMAGE SEGMENTATION

We investigate the impacts of objective functions on the performance of deep-learning-based prostate magnetic resonance image segmentation. To this end, we first develop a baseline convolutional neural network (BCNN) for the prostate image segmentation, which consists of encoding, bridge, decoding, and classification modules. In the BCNN, we use 3D convolutional layers to consider volumetric information. Also, we adopt the residual feature forwarding and intermediate feature propagation techniques to make the BCNN reliably trainable for various objective functions.

ICIP_JHMUN_POSTER.pdf

ICIP_JHMUN_POSTER.pdf (573)

Categories:: Medical imaging

13 Views

VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO

Read more about VID2SPEECH: SPEECH RECONSTRUCTION FROM SILENT VIDEO
Log in to post comments

Speechreading is a notoriously difficult task for humans to perform. In this paper we present an end-to-end model based on a convolutional neural network (CNN) for generating an intelligible acoustic speech signal from silent video frames of a speaking person. The proposed CNN generates sound features for each frame based on its neighboring frames. Waveforms are then synthesized from the learned speech features to produce intelligible speech.