Documents
Presentation Slides
SortComp (Sort-and-Compress) - Towards a Universal Lossless Compression Scheme for Matrix and Tabular Data
- Citation Author(s):
- Submitted by:
- Xizhe Cheng
- Last updated:
- 7 March 2022 - 8:49pm
- Document Type:
- Presentation Slides
- Document Year:
- 2022
- Event:
- Presenters:
- Xizhe Cheng
- Paper Code:
- 211
- Categories:
- Keywords:
- Log in to post comments
A universal scheme is proposed for the lossless compression of two-dimensional tables and matrices. Instead of standard row- or column-based compression, we propose to sort each column first and record both the sorted table and the corresponding permutation table of the sorting permutations. These two tables are then separately compressed. In this new scheme, both intra- and inter-column correlations can be efficiently captured, giving rise to improved compression ratio in particular when both column-wise and row-wise dependencies cooccur. This scheme reduces the problem of the compression of an arbitrary two-dimensional table to that of a ’permutation table’ together with a ’sorted table’, where the former is only dependent on the table dimension and the latter can be effectively compressed column-by-column using predictive methods. Based on this scheme, a new algorithm is proposed, SortComp (sort-and-compress). For correlated columns, we give an estimation of the asymptotic bit rate of the algorithm and compare it to column-oriented compression schemes. Numerical experiments on real-life csv datasets validate the advantages of SortComp compared to existing row- and column-oriented compression algorithms.
Comments
About reordering
If I understood correctly, your goal is to aim for low zeroth order entropy when considering shuffling the columns and rows.
To get higher order compression, have you thought of other approaches in how to compress column-by-column like relative Lempel-Ziv?
Besides, since you cited three approaches similar to your work using reordering techniques,
I wonder how good your approach would compare against them.