Documents
Poster
Poster
Compressing and Randomly Accessing Sequences
- Citation Author(s):
- Submitted by:
- Rajeev Raman
- Last updated:
- 21 April 2020 - 10:31am
- Document Type:
- Poster
- Document Year:
- 2020
- Event:
- Presenters:
- Rajeev Raman
- Paper Code:
- DCC-197
- Categories:
- Keywords:
- Log in to post comments
In this paper we consider the problem of storing sequences of symbols in
a compressed format, while supporting random access to the symbols without
decompression. Although this is a well-studied problem when the data is
textual, the kind of sequences we look at are not textual, and we argue
that traditional compression methods used in the text algorithms community
(such as compressors targeting $k$-th order empirical entropy) do not
perform as well on these sequential data, and simpler methods such
as Huffman-coding the deltas between sequence elements give better
compression performance. We discuss data structures that allow
random access to sequence elements that target such measures.
Comments
The problem dealt in this
The problem dealt in this poster seems to be the same as https://sigport.org/documents/towards-better-compressed-representations,
but here you follow a delta-encoded approach.
Since you said that the theorem shown in the poster is unattractive because $S$ may be very large, I wonder how large $S$ actually is on the used datasets.
We know that $S <= nσ$, but it could be much smaller.