Approximating Optimal Bidirectional Macro Schemes.
- Citation Author(s):
- Submitted by:
- Ana Sofia Correia
- Last updated:
- 31 March 2020 - 4:25pm
- Document Type:
- Presentation Slides
- Log in to post comments
Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better compression on highly repetitive sequences. We consider the problem of approximating optimal bidirectional macro schemes. We describe a simulated annealing algorithm that usually converges quickly. Moreover, in some cases, we obtain bidirectional macro schemes that are provably a 2-approximation of the optimal. We test our algorithm on a number of artificial repetitive texts and verify that it is efficient in practice and outperforms Lempel-Ziv, sometimes by a wide margin.
0 users have voted:
Thanks for the video!
Thanks for the video!
I already wondered about how to efficiently compute such an approximation, but with the link/cut tree data structure, it seems feasible.
If I understood correctly, the gist is that in some cases, your compression scheme based on simulated annealing has a size of at most 2γ, where γ is the size of the smallest string attractor.
While watching your video, I've collected some small questions:
With what configuration do you start the simulated annealing? Do you start with the LZ77 phrases (in particular, the LZ77 scheme that introduces a new character at the end of each referencing phrase)?
If I interpreted the figure in Slide 23 correctly, the black-shaded characters are the endings of the phrases. However, are there any explicit characters?
How did you evaluate the 2-approximation? Do you compute the smallest bidirectional macro scheme or the smallest string attractor via brute force?
Some references that could be interesting for future evaluation:
LZRR (https://ieeexplore.ieee.org/document/8712715) and plcpcomp (https://doi.org/10.4230/LIPIcs.ESA.2019.41) are other bidirectional macro schemes with empirically well-received approximations.
In particular, the plcpcomp approach uses ideas similar to the link/cut tree data structure for merging or splitting phrases, but with the difference that there the focus was put on improving the decompression in external memory.
For the future work: If you propose tests with large files, I wonder how fast your algorithm is.
Simulated annealing and the merge operation on the link cut tree seems to be very slow.
Finally, a didactic hint: Instead of introducing LZ78 in detail (Slide 4), it would have been better to introduce string attractors (needed in Slide 24) and a common macro scheme (e.g. LZ77).