Sorry, you need to enable JavaScript to visit this website.

SortComp (Sort-and-Compress) - Towards a Universal Lossless Compression Scheme for Matrix and Tabular Data

Citation Author(s):
Xizhe Cheng, Sian-Jheng LIN, Jie SUN
Submitted by:
Xizhe Cheng
Last updated:
7 March 2022 - 8:49pm
Document Type:
Presentation Slides
Document Year:
2022
Event:
Presenters:
Xizhe Cheng
Paper Code:
211
Categories:
Keywords:
 

A universal scheme is proposed for the lossless compression of two-dimensional tables and matrices. Instead of standard row- or column-based compression, we propose to sort each column first and record both the sorted table and the corresponding permutation table of the sorting permutations. These two tables are then separately compressed. In this new scheme, both intra- and inter-column correlations can be efficiently captured, giving rise to improved compression ratio in particular when both column-wise and row-wise dependencies cooccur. This scheme reduces the problem of the compression of an arbitrary two-dimensional table to that of a ’permutation table’ together with a ’sorted table’, where the former is only dependent on the table dimension and the latter can be effectively compressed column-by-column using predictive methods. Based on this scheme, a new algorithm is proposed, SortComp (sort-and-compress). For correlated columns, we give an estimation of the asymptotic bit rate of the algorithm and compare it to column-oriented compression schemes. Numerical experiments on real-life csv datasets validate the advantages of SortComp compared to existing row- and column-oriented compression algorithms.

up
0 users have voted:

Comments

If I understood correctly, your goal is to aim for low zeroth order entropy when considering shuffling the columns and rows.
To get higher order compression, have you thought of other approaches in how to compress column-by-column like relative Lempel-Ziv?

Besides, since you cited three approaches similar to your work using reordering techniques,
I wonder how good your approach would compare against them.