Sorry, you need to enable JavaScript to visit this website.

The semantic information obtained from large-scale computation in image compression is not practical. To solve this problem, we propose an Attention Aggregation Mechanism (AAM) for learning-based image compression, which is able to aggregate attention map from multiple scales and facilitate information embedding.


High Efficiency Video Coding. (HEVC) is the product of a large collaborative effort from industry and academic community and reflects the new international standardization for digital video coding technology. Compression capability is the main goal behind the digital video compression technology. HEVC achieves this goal at the expense of dramatically increasing coding complexity. One such area of increased complexity is due to the use of a recursive quad-tree to partition every frame to various block sizes, a process called prediction mode.


If we quantize a block of n samples and then transmit information about quantization step size in the same bitstream, we may naturally expect such a code to be at least O(1/n) redundant. However, as we will show in this paper, this may not necessarily be true. Moreover, we prove that asymptotically, such codes can be as efficient as block codes without embedded step-size information. The proof relies on results from the Diophantine approximations theory. We discuss the significance of this finding for practical applications, such as the design of audio and video coding algorithms.


Few-shot object detection (FSOD) enables the detector to recognize novel objects only using limited training samples, which could greatly alleviate model’s dependency on data. Most existing methods include two training stages, namely base training and fine-tuning. However, the unlabeled novel instances in the base set were untouched in previous works, which can be re-used to enhance the FSOD performance. Thus, a new instance mining model is proposed in this paper to excavate the novel samples from the base set. The detector is thus fine-tuned again by these additional free novel instances.


To capture motion homogeneity between successive frames, the edge position difference (EPD) measure based motion modeling (EPD-MM) has shown good motion compensation capabilities. The EPD-MM technique is underpinned by the fact that from one frame to next, edges map to edges and such mapping can be captured by an appropriate motion model. An example of such a motion model is the discrete cosine basis oriented (DCO) motion model, which can capture complex motion and has a smooth and sparse representation.


Modern codecs offer numerous settings that can nonuniformly alter the encoding process. Some researchers have proposed video encoding multiobjective optimization, but none of these proposals addresses optimization of the entire encoder's option space when it is large. In this paper, we present a method for multiobjective encoding optimization of a given encoder in terms of relative video bitrate and encoding speed. The process takes place over one or more videos against a set of reference presets. It actively exploits similarities in the encoding process for similar videos.


This paper presents an adaptive bilateral matching technique for decoder-side motion vector refinement in video coding. It allows encoder to choose not only the conventional bilateral matching mode with symmetric motion vector difference but also the asymmetric alternatives. To study the efficiency of the proposed technique, the proposed method is integrated in the Versatile Video Coding Test Model 11.0. The experimental result reports an overall of -2.78% luma Bjøntegaard Delta rate for the random-access configurations.

