Sorry, you need to enable JavaScript to visit this website.

Few-shot object detection (FSOD) enables the detector to recognize novel objects only using limited training samples, which could greatly alleviate model’s dependency on data. Most existing methods include two training stages, namely base training and fine-tuning. However, the unlabeled novel instances in the base set were untouched in previous works, which can be re-used to enhance the FSOD performance. Thus, a new instance mining model is proposed in this paper to excavate the novel samples from the base set. The detector is thus fine-tuned again by these additional free novel instances.


To capture motion homogeneity between successive frames, the edge position difference (EPD) measure based motion modeling (EPD-MM) has shown good motion compensation capabilities. The EPD-MM technique is underpinned by the fact that from one frame to next, edges map to edges and such mapping can be captured by an appropriate motion model. An example of such a motion model is the discrete cosine basis oriented (DCO) motion model, which can capture complex motion and has a smooth and sparse representation.


Modern codecs offer numerous settings that can nonuniformly alter the encoding process. Some researchers have proposed video encoding multiobjective optimization, but none of these proposals addresses optimization of the entire encoder's option space when it is large. In this paper, we present a method for multiobjective encoding optimization of a given encoder in terms of relative video bitrate and encoding speed. The process takes place over one or more videos against a set of reference presets. It actively exploits similarities in the encoding process for similar videos.


This paper presents an adaptive bilateral matching technique for decoder-side motion vector refinement in video coding. It allows encoder to choose not only the conventional bilateral matching mode with symmetric motion vector difference but also the asymmetric alternatives. To study the efficiency of the proposed technique, the proposed method is integrated in the Versatile Video Coding Test Model 11.0. The experimental result reports an overall of -2.78% luma Bjøntegaard Delta rate for the random-access configurations.


This paper presents a novel class of Graph-based Transform based on 3D convolutional neural networks (GBT-CNN) within the context of block-based predictive transform coding of imaging data. The proposed GBT-CNN uses a 3D convolutional neural network (3D-CNN) to predict the graph information needed to compute the transform and its inverse, thus reducing the signalling cost to reconstruct the data after transformation.


Most previous studies on lossless image compression have focused on improving preprocessing functions to reduce the redundancy of pixel values in real images. However, we assumed stochastic generative models directly on pixel values and focused on achieving the theoretical limit of the assumed models. In this study, we proposed a stochastic model based on improper quadtrees. We theoretically derive the optimal code for the proposed model under the Bayes criterion. In general, Bayes-optimal codes require an exponential order of calculation with respect to the data lengths.