Applying Practical Parallel Grammar Compression to Large-scale Data

Re-pair is a grammar-based compression algorithm. It achieves higher compression rates for text, graph, and tree. While Re-pair is a linear-time algorithm, it is slower than other general compression algorithms in practice. This is an obstacle in applying Re-pair to large-scale data. In this paper, we present Parallel Re-pair, a practical implementation that enables parallel processing of Re-pair. In Parallel Re-pair, Re-pair is executed in multiple threads for the divided block. Each thread shares a dictionary and it can output a single CFG. This allows us to process the entire input text in a compressed state. We experimented with datasets of tens of gigabytes. Our experiments show that Parallel Re-Pair is 7.9 to 10.4 times faster than sequential ones on 32 processors.

Applying Practical Parallel Grammar Compression to Large-scale Data - final.pdf

Applying Practical Parallel Grammar Compression to Large-scale Data - final.pdf (459)

Thumbs Up

CITE

QUESTIONS?

Send Author a Private Message
Report a problem with this Document

Documents

Poster

Applying Practical Parallel Grammar Compression to Large-scale Data

Applying Practical Parallel Grammar Compression to Large-scale Data - final.pdf

QUESTIONS?