Documents
Poster
Practical Repetition-Aware Grammar Compression
- Citation Author(s):
- Submitted by:
- Isamu Furuya
- Last updated:
- 25 March 2020 - 8:36pm
- Document Type:
- Poster
- Document Year:
- 2020
- Event:
- Categories:
- Keywords:
- Log in to post comments
The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair was recently proposed as an improvement to RePair for constructing small-sized context free grammar for repetitive text data. However, a compact encoding scheme has not been discussed for MR-RePair. We propose a practical encoding method for MR-RePair and show its effectiveness through comparative experiments. Moreover, we extend MR-RePair to run-length context free grammar and design a novel variant for it called RL-MR-RePair. We experimentally demonstrate that a compression scheme consisting of RL-MR-RePair and the proposed encoding method show good performance on real repetitive datasets.
Comments
While reading your poster, I
While reading your poster, I've collected some small questions:
I have not found the description of the "packed gamma encoding" in the cited references.
Maybe the references used another terminology? It would be nice to have a clear definition of this encoding, as it seems a core idea of the poster.
Can you separate the space needed for storing POPPT and PGE? It would be interesting to have an idea of which needs more space.
In the experiments, it seems you use two variants of PGE, called pge6 and pge8. What is their difference, and what is 'ible' and 'ps'?