Sorry, you need to enable JavaScript to visit this website.

Practical Repetition-Aware Grammar Compression

Primary tabs

Citation Author(s):
Submitted by:
Isamu Furuya
Last updated:
25 March 2020 - 8:36pm
Document Type:
Poster
Document Year:
2020
Event:

Abstract 

Abstract: 

The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair was recently proposed as an improvement to RePair for constructing small-sized context free grammar for repetitive text data. However, a compact encoding scheme has not been discussed for MR-RePair. We propose a practical encoding method for MR-RePair and show its effectiveness through comparative experiments. Moreover, we extend MR-RePair to run-length context free grammar and design a novel variant for it called RL-MR-RePair. We experimentally demonstrate that a compression scheme consisting of RL-MR-RePair and the proposed encoding method show good performance on real repetitive datasets.

up
0 users have voted:

Comments

While reading your poster, I've collected some small questions:

I have not found the description of the "packed gamma encoding" in the cited references.
Maybe the references used another terminology? It would be nice to have a clear definition of this encoding, as it seems a core idea of the poster.

Can you separate the space needed for storing POPPT and PGE? It would be interesting to have an idea of which needs more space.

In the experiments, it seems you use two variants of PGE, called pge6 and pge8. What is their difference, and what is 'ible' and 'ps'?

Dataset Files

main.pdf

(150)