Abstract:

Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better compression on highly repetitive sequences. We consider the problem of approximating optimal bidirectional macro schemes. We describe a simulated annealing algorithm that usually converges quickly. Moreover, in some cases, we obtain bidirectional macro schemes that are provably a 2-approximation of the optimal. We test our algorithm on a number of artificial repetitive texts and verify that it is efficient in practice and outperforms Lempel-Ziv, sometimes by a wide margin.

### Paper Details

- Authors:
- Submitted On:
- 31 March 2020 - 4:25pm
- Short Link:
- Type:
- Presentation Slides
- Event:
- Session:
- Session 5

### Subscribe

[1] Luís M. S. Russo, Ana D. Correia, Gonzalo Navarro, Alexandre P. Francisco,
"Approximating Optimal Bidirectional Macro Schemes.",
IEEE SigPort,
2020. [Online]. Available: http://sigport.org/5083. Accessed: Jul. 07, 2020.

@article{5083-20,

url = {http://sigport.org/5083},

author = {Luís M. S. Russo; Ana D. Correia; Gonzalo Navarro; Alexandre P. Francisco },

publisher = {IEEE SigPort},

title = {Approximating Optimal Bidirectional Macro Schemes.},

year = {2020} }

url = {http://sigport.org/5083},

author = {Luís M. S. Russo; Ana D. Correia; Gonzalo Navarro; Alexandre P. Francisco },

publisher = {IEEE SigPort},

title = {Approximating Optimal Bidirectional Macro Schemes.},

year = {2020} }

TY - EJOUR

T1 - Approximating Optimal Bidirectional Macro Schemes.

AU - Luís M. S. Russo; Ana D. Correia; Gonzalo Navarro; Alexandre P. Francisco

PY - 2020

PB - IEEE SigPort

UR - http://sigport.org/5083

ER -

T1 - Approximating Optimal Bidirectional Macro Schemes.

AU - Luís M. S. Russo; Ana D. Correia; Gonzalo Navarro; Alexandre P. Francisco

PY - 2020

PB - IEEE SigPort

UR - http://sigport.org/5083

ER -

Luís M. S. Russo, Ana D. Correia, Gonzalo Navarro, Alexandre P. Francisco.
(2020).
Approximating Optimal Bidirectional Macro Schemes..
IEEE SigPort.
http://sigport.org/5083

Luís M. S. Russo, Ana D. Correia, Gonzalo Navarro, Alexandre P. Francisco,
2020.
Approximating Optimal Bidirectional Macro Schemes..
Available at:
http://sigport.org/5083.

Luís M. S. Russo, Ana D. Correia, Gonzalo Navarro, Alexandre P. Francisco.
(2020).
"Approximating Optimal Bidirectional Macro Schemes.."
Web.

1. Luís M. S. Russo, Ana D. Correia, Gonzalo Navarro, Alexandre P. Francisco.
Approximating Optimal Bidirectional Macro Schemes. [Internet].
IEEE SigPort; 2020.
Available from :
http://sigport.org/5083

## Comments

## Thanks for the video!

Thanks for the video!

I already wondered about how to efficiently compute such an approximation, but with the link/cut tree data structure, it seems feasible.

If I understood correctly, the gist is that in some cases, your compression scheme based on simulated annealing has a size of at most 2γ, where γ is the size of the smallest string attractor.

While watching your video, I've collected some small questions:

With what configuration do you start the simulated annealing? Do you start with the LZ77 phrases (in particular, the LZ77 scheme that introduces a new character at the end of each referencing phrase)?

If I interpreted the figure in Slide 23 correctly, the black-shaded characters are the endings of the phrases. However, are there any explicit characters?

How did you evaluate the 2-approximation? Do you compute the smallest bidirectional macro scheme or the smallest string attractor via brute force?

Some references that could be interesting for future evaluation:

LZRR (https://ieeexplore.ieee.org/document/8712715) and plcpcomp (https://doi.org/10.4230/LIPIcs.ESA.2019.41) are other bidirectional macro schemes with empirically well-received approximations.

In particular, the plcpcomp approach uses ideas similar to the link/cut tree data structure for merging or splitting phrases, but with the difference that there the focus was put on improving the decompression in external memory.

For the future work: If you propose tests with large files, I wonder how fast your algorithm is.

Simulated annealing and the merge operation on the link cut tree seems to be very slow.

Finally, a didactic hint: Instead of introducing LZ78 in detail (Slide 4), it would have been better to introduce string attractors (needed in Slide 24) and a common macro scheme (e.g. LZ77).