Practical Repetition-Aware Grammar Compression

The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair was recently proposed as an improvement to RePair for constructing small-sized context free grammar for repetitive text data. However, a compact encoding scheme has not been discussed for MR-RePair. We propose a practical encoding method for MR-RePair and show its effectiveness through comparative experiments. Moreover, we extend MR-RePair to run-length context free grammar and design a novel variant for it called RL-MR-RePair. We experimentally demonstrate that a compression scheme consisting of RL-MR-RePair and the proposed encoding method show good performance on real repetitive datasets.

main.pdf

main.pdf (511)

Thumbs Up

Comments

While reading your poster, I

Permalink Submitted by Dominik Koeppl on 12 April 2020 - 1:53am

While reading your poster, I've collected some small questions:

I have not found the description of the "packed gamma encoding" in the cited references.
Maybe the references used another terminology? It would be nice to have a clear definition of this encoding, as it seems a core idea of the poster.

Can you separate the space needed for storing POPPT and PGE? It would be interesting to have an idea of which needs more space.

In the experiments, it seems you use two variants of PGE, called pge6 and pge8. What is their difference, and what is 'ible' and 'ps'?

CITE

Documents

Poster

Practical Repetition-Aware Grammar Compression

main.pdf

Comments

While reading your poster, I

QUESTIONS?