Compressing and Randomly Accessing Sequences

In this paper we consider the problem of storing sequences of symbols in
a compressed format, while supporting random access to the symbols without
decompression. Although this is a well-studied problem when the data is
textual, the kind of sequences we look at are not textual, and we argue
that traditional compression methods used in the text algorithms community
(such as compressors targeting $k$-th order empirical entropy) do not
perform as well on these sequential data, and simpler methods such
as Huffman-coding the deltas between sequence elements give better
compression performance. We discuss data structures that allow
random access to sequence elements that target such measures.

main.pdf

main.pdf (641)

Thumbs Up

Comments

The problem dealt in this

Permalink Submitted by Dominik Koeppl on 24 April 2020 - 6:44am

The problem dealt in this poster seems to be the same as https://sigport.org/documents/towards-better-compressed-representations,
but here you follow a delta-encoded approach.
Since you said that the theorem shown in the poster is unattractive because $S$ may be very large, I wonder how large $S$ actually is on the used datasets.
We know that $S <= nσ$, but it could be much smaller.

CITE

Documents

Poster

Compressing and Randomly Accessing Sequences

main.pdf

Comments

The problem dealt in this

QUESTIONS?