Sorry, you need to enable JavaScript to visit this website.

Inspired by multi-task learning, degraded low-quality color-depth images enhancement tasks are transformed as a joint color-depth optimization model by using maximum a posteriori estimation. This model is optimized alternatively in an iterative way to get the solutions of CGD-SR task and Low-Brightness Color Image Enhancement (LBC-IE) task. The whole iterative optimization procedure is expanded as a joint model-driven unfolding network.

Categories:
19 Views

A new method of Gaussian process dynamic model (GPDM), named class-aware shared GPDM (CSGPDM), is presented in this paper. One of the most difference between our CSGPDM and existing GPDM is considering class information which helps to build the class label-based latent space being effective for the following class-related tasks. In terms of representation learning, CSGPDM is optimized by considering not only a non-linear relationship but also time-series relation and discriminative information of each class label.

Categories:
15 Views

Lack of audio-video synchronization is a common problem during television broadcasts and video conferencing, leading to an unsatisfactory viewing experience. A widely accepted paradigm is to create an error detection mechanism that identifies the cases when audio is leading or lagging. We propose ModEFormer, which independently extracts audio and video embeddings using modality-specific transformers.

Categories:
38 Views

Recently, cross modal compression (CMC) is proposed to compress highly redundant visual data into a compact, common, human-comprehensible domain (such as text) to preserve semantic fidelity for semantic-related applications. However, CMC only achieves a certain level of semantic fidelity at a constant rate, and the model aims to optimize the probability of the ground truth text but not directly semantic fidelity. To tackle the problems, we propose a novel scheme named rate-distortion optimized CMC (RDO-CMC).

Categories:
51 Views

Rate-distortion (RD) theory is a fundamental theory for lossy image compression that treats compressing the original images to a specified bitrate with minimal signal distortion, which is an essential metric in practical application. Moreover, with the development of visual analysis applications (such as classification, detection, segmentation, etc.), the semantic distortion in compressed images are also an important dimension in the theoretical analysis of lossy image compression.

Categories:
54 Views

In this paper, we propose an approach for learning binary hash codes
for image retrieval. Canonical Correlation Analysis (CCA) is used
to design two loss functions for training a neural network such that
the correlation between the two views to CCA is maximum. The
main motivation for using CCA for feature space learning is that
dimensionality reduction is possible and short binary codes could
be generated. The first loss maximizes the correlation between the
hash centers and the learned hash codes. The second loss maximizes

Categories:
15 Views

Automatic song writing (ASW) typically involves four tasks: lyric-to-lyric generation, melody-to-melody generation, lyric-to-melody generation, and melody-to-lyric generation.
Previous works have mainly focused on individual tasks without considering the correlation between them, and thus a unified framework to solve all four tasks has not yet been explored.

Categories:
10 Views

Incorporating visual information is a promising approach to improve the performance of speech separation. Many related works have been conducted and provide inspiring results. However, low quality videos appear commonly in real scenarios, which may significantly degrade the performance of normal audio-visual speech separation system. In this paper, we propose a new structure to fuse the audio and visual features, which uses the audio feature to select relevant visual features by utilizing the attention mechanism.

Categories:
15 Views

Pages