The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair was recently proposed as an improvement to RePair for constructing small-sized context free grammar for repetitive text data. However, a compact encoding scheme has not been discussed for MR-RePair. We propose a practical encoding method for MR-RePair and show its effectiveness through comparative experiments.

- Categories:

- Read more about Towards Better Compressed Representations
- Log in to post comments

We introduce the problem of computing a parsing where each phrase is of length at most m and which minimizes the zeroth order entropy of parsing. Based on the recent theoretical results we devise a heuristic for this problem. The solution has straightforward application in succinct text representations and gives practical improvements. Moreover the proposed heuristic yields structure which size can be bounded both by |S|H_{m-1}(S) and by |S|/m(H_0(S) + ... + H_{m-1}(S)),where H_{k}(S) is the k-th order empirical entropy of S.

## poster_dcc.pdf

- Categories:

- Read more about Edge minimization in de Bruijn graphs
- 1 comment
- Log in to post comments

This paper introduces the de Bruijn graph edge minimization problem, which is related to the compression of de Bruijn graphs: find the order-k de Bruijn graph with minimum edge count among all orders. We describe an efficient algorithm that solves this problem. Since the edge minimization problem is connected to the BWT compression technique called "tunneling", the paper also describes a way to minimize the length of a tunneled BWT in such a way that useful properties for sequence analysis are preserved.

- Categories:

- Read more about Weighted Adaptive Huffman Coding
- 1 comment
- Log in to post comments

Huffman coding is known to be optimal in case the alphabet is known in advance, the set of codewords is fixed and each codeword consists of an integral number of bits. If one of these conditions is violated, optimality is not guaranteed.

In the {\it dynamic\/} variant of Huffman coding the encoder and decoder maintain identical copies of the model; at each position, the model consists of the frequencies of the elements processed so far.

- Categories:

- Read more about Non-Binary Robust Universal Variable Length Codes
- Log in to post comments

We extend the binary Fibonacci code to $d$-ary codes, with $d\ge 2$.

This is motivated by future technological developments in which the basic unit of storage will not be just a 2-valued bit, but possibly an element that is able to distinguish between $d$ different values.

The proposed codes are prefix-free, complete and more robust than Huffman codes. Experimental results illustrate that the compression efficiency of non-binary Fibonacci codes are very close to the savings achieved by the corresponding non-binary Huffman coding of the same order.

- Categories:

We extend the binary Fibonacci code to $d$-ary codes, with $d\ge 2$.

This is motivated by future technological developments in which the basic unit of storage will not be just a 2-valued bit, but possibly an element that is able to distinguish between $d$ different values.

The proposed codes are prefix-free, complete and more robust than Huffman codes. Experimental results illustrate that the compression efficiency of non-binary Fibonacci codes are very close to the savings achieved by the corresponding non-binary Huffman coding of the same order.

- Categories:

- Read more about Binary Representation and High Efficient Compression of 3D CNN Features for Action Recognition
- Log in to post comments

- Categories:

- Categories:

- Read more about Bitvectors with runs and the successor/predecessor problem
- 2 comments
- Log in to post comments

The successor and predecessor problem consists of obtaining the closest value in a set of integers, greater or smaller than a given value. This problem has interesting applications, like the intersection of inverted lists. It can be easily modeled by using a bitvector of size n and its operations rank and select. However, there is a practical approach [1], which keeps the best theoretical bounds, and allows to solve successor and predecessor more efficiently.

- Categories:

- Read more about LFZip: Lossy Compression of Multivariate Floating-Point Time Series Data via Improved Prediction
- Log in to post comments

Time series data compression is emerging as an important problem with the growth in IoT devices and sensors. Due to the presence of noise in these datasets, lossy compression can often provide significant compression gains without impacting the performance of downstream applications. In this work, we propose an error-bounded lossy compressor, LFZip, for multivariate floating-point time series data that provides guaranteed reconstruction up to user-specified maximum absolute error.

## slides.pdf

- Categories: