Sorry, you need to enable JavaScript to visit this website.

Lossless data compression algorithms were developed to shrink files. But these algorithms can also be used to measure file similarity. In this article, the meta-algorithms Concat Compress and Cross Compress are subjected to an extensive practical test together with the compression algorithms Re-Pair, gzip and bz2: Five labeled datasets are subjected to a classification procedure using these algorithms. Theoretical considerations about the two meta-algorithms were already made about 10 years ago, but little has happened since then.

Categories:
52 Views

Linear computation coding (LCC) has been developed in as a new framework
for the computation of linear functions. LCC significantly reduces the complexity
of matrix-vector multiplication. In basic LCC, storage is not restricted i.e. the
wiring exponents are arbitrary integer exponents of 2.
In this work, we constrain the set of wiring exponents to be finite. From an
information-theoretic viewpoint, this problem is function compression with a finite-alphabet
representation by atoms. We show that by an efficient choice of this set,

Categories:
20 Views

End-to-end learned image compression (LIC) has become promising alternatives for lossy image compression. However, deployments of LIC models are restricted, due to excessive network parameters and high computational complexity. Existing LIC models realized throughout with integer networks are significantly degraded in rate-distortion (R-D) performance. In this paper, we propose a novel fully integerized model for LIC that leverages channel-wise weight and activation quantization.

Categories:
42 Views

Quantization of one deep neural network to multiple compression rates (precisions) has been recently considered for flexible deployments in real-world scenarios. However, existing methods for network quantization under multiple compression rates leverage fixed-precision bit-width allocation or heuristically search for mixed-precision strategy and cannot well balance efficiency and performance.

Categories:
20 Views

This paper studies compression techniques for parallel in-memory sparse tensor algebra. We find that applying simple existing compression schemes can lead to performance loss in some cases. To resolve this issue, we introduce an optimized algorithm for processing compressed inputs that can improve both the space usage as well as the performance compared to uncompressed inputs. We implement the compression techniques on top of a suite of sparse matrix algorithms generated by taco, a compiler for sparse tensor algebra.

Categories:
10 Views

This paper proposes an effective universal "on-the-fly" mechanism for stochastic codebook generation in lossy coding of Markov sources.

Categories:
20 Views

In this paper, we propose a neural implementation of a companded quantization scheme allowing to train and implement optimal scalar quantization in data compression systems based on neural networks. The advantage of companded quantization lies in the fact that it allows to implement optimal non-linear quantization in a simpler form based on uniform quantization. In our work, we consider two different models of uniform quantization. Further on, in order to verify the effectiveness of the proposed approach, we made a series of experiments on natural grayscale images.

Categories:
31 Views

The wavelet tree is a data structure that indexes a text over an integer alphabet for efficient rank and select queries. Using the Huffman encoding, it can be stored in zero-order entropy-compressed space. We present a highly engineered open source implementation of an efficient sequential construction algorithm that makes use of bit parallelism via vector instructions. On hardware featuring ultrawide registers of up to 512 bits, it outperforms the currently fastest known practical sequential construction algorithms by a factor of up to 2.5.

Categories:
25 Views

It is known that a context-free grammar (CFG) that produces a single string can be derived from the compact directed acyclic word graph (CDAWG) for the same string. In this work, we show that the CFG derived from a CDAWG is deeply connected to the maximal repeat content of the string it produces and thus has O(m) rules, where m is the number of maximal repeats in the string. We then provide a generic algorithm based on this insight for constructing the CFG from the LCP-intervals of a string in O(n) time, where n is the length of the string.

Categories:
80 Views

Pages