
- Read more about Deep Hashing With Hash Center Update for Efficient Image Retrieval
- 1 comment
- Log in to post comments
In this paper, we propose an approach for learning binary hash codes
for image retrieval. Canonical Correlation Analysis (CCA) is used
to design two loss functions for training a neural network such that
the correlation between the two views to CCA is maximum. The
main motivation for using CCA for feature space learning is that
dimensionality reduction is possible and short binary codes could
be generated. The first loss maximizes the correlation between the
hash centers and the learned hash codes. The second loss maximizes
4514-2.pdf

- Categories:

- Read more about TRAINING STRATEGIES FOR AUTOMATIC SONG WRITING: A UNIFIED FRAMEWORK PERSPECTIVE
- Log in to post comments
Automatic song writing (ASW) typically involves four tasks: lyric-to-lyric generation, melody-to-melody generation, lyric-to-melody generation, and melody-to-lyric generation.
Previous works have mainly focused on individual tasks without considering the correlation between them, and thus a unified framework to solve all four tasks has not yet been explored.
- Categories:

- Read more about TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS
- Log in to post comments
Incorporating visual information is a promising approach to improve the performance of speech separation. Many related works have been conducted and provide inspiring results. However, low quality videos appear commonly in real scenarios, which may significantly degrade the performance of normal audio-visual speech separation system. In this paper, we propose a new structure to fuse the audio and visual features, which uses the audio feature to select relevant visual features by utilizing the attention mechanism.
poster.pdf

presentation.pptx

- Categories:

- Read more about GLOBAL-LOCAL FEATURE ENHANCEMENT NETWORK FOR ROBUST OBJECT DETECTION USING MMWAVE RADAR AND CAMERA
- Log in to post comments
- Categories:

- Read more about GLOBAL-LOCAL FEATURE ENHANCEMENT NETWORK FOR ROBUST OBJECT DETECTION USING MMWAVE RADAR AND CAMERA
- 1 comment
- Log in to post comments
- Categories:

- Read more about COOPNET: MULTI-MODAL COOPERATIVE GENDER PREDICTION IN SOCIAL MEDIA USER PROFILING
- Log in to post comments
icassp poster.pdf

- Categories:

- Read more about TIME-LAG AWARE MULTI-MODAL VARIATIONAL AUTOENCODER USING BASEBALL VIDEOS AND TWEETS FOR PREDICTION OF IMPORTANT SCENES
- Log in to post comments
A novel method based on time-lag aware multi-modal variational autoencoder for prediction of important scenes (Tl-MVAE-PIS) using baseball videos and tweets posted on Twitter is presented in this paper. This paper has the following two technical contributions. First, to effectively use heterogeneous data for the prediction of important scenes, we transform textual, visual and audio features obtained from tweets and videos to the latent features. Then Tl-MVAE-PIS can flexibly express the relationships between them in the constructed latent space.
- Categories:

- Read more about TIME-LAG AWARE MULTI-MODAL VARIATIONAL AUTOENCODER USING BASEBALL VIDEOS AND TWEETS FOR PREDICTION OF IMPORTANT SCENES
- Log in to post comments
A novel method based on time-lag aware multi-modal variational autoencoder for prediction of important scenes (Tl-MVAE-PIS) using baseball videos and tweets posted on Twitter is presented in this paper. This paper has the following two technical contributions. First, to effectively use heterogeneous data for the prediction of important scenes, we transform textual, visual and audio features obtained from tweets and videos to the latent features. Then Tl-MVAE-PIS can flexibly express the relationships between them in the constructed latent space.
- Categories:

In this work, we present a hybrid CTC/Attention model based on a modified ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner. In particular, the audio and visual encoders learn to extract features directly from raw pixels and audio waveforms, respectively, which are then fed to conformers and then fusion takes place via a Multi-Layer Percep- tron (MLP). The model learns to recognise characters using a com- bination of CTC and an attention mechanism.
- Categories:

- Read more about An Adaptive Multi-Scale and Multi-Level Features Fusion Network with Perceptual Loss for Change Detection
- Log in to post comments
Change detection plays a vital role in monitoring and analyzing temporal changes in Earth observation tasks. This paper proposes a novel adaptive multi-scale and multi-level features fusion network for change detection in very-high-resolution bi-temporal remote sensing images. The proposed approach has three advantages. Firstly, it excels in abstracting high-level representations empowered by a highly effective feature extraction module.
MFPNet_poster.pdf

- Categories: