Sorry, you need to enable JavaScript to visit this website.

Rate-Distortion Optimization for Cross Modal Compression

Citation Author(s):
Submitted by:
Junlong Gao
Last updated:
16 March 2023 - 10:08am
Document Type:
Presentation Slides
Document Year:
2023
Event:
Presenters:
Junlong Gao
 

Recently, cross modal compression (CMC) is proposed to compress highly redundant visual data into a compact, common, human-comprehensible domain (such as text) to preserve semantic fidelity for semantic-related applications. However, CMC only achieves a certain level of semantic fidelity at a constant rate, and the model aims to optimize the probability of the ground truth text but not directly semantic fidelity. To tackle the problems, we propose a novel scheme named rate-distortion optimized CMC (RDO-CMC). Specifically, we model the text generation process as a Markov decision process and propose rate-distortion reward which is used in reinforcement learning to optimize text generation. In rate-distortion reward, the distortion measures both the semantic fidelity and naturalness of the encoded text. The rate for the text is estimated by the sum of the amount of information of all the tokens in the text since the amount of information of each token is a lower bound of coding bits. Experimentally, RDO-CMC effectively controls the rate in the CMC framework and achieves competitive performance on MSCOCO dataset.

up
0 users have voted: