Sorry, you need to enable JavaScript to visit this website.

Applying Practical Parallel Grammar Compression to Large-scale Data

Citation Author(s):
Masaski Matsushita, Yasushi Inoguchi
Submitted by:
Masaki Matsushita
Last updated:
4 March 2022 - 12:03pm
Document Type:
Poster
Document Year:
2022
Event:
Presenters:
Masaki Matsushita
Paper Code:
176
Categories:
 

Re-pair is a grammar-based compression algorithm. It achieves higher compression rates for text, graph, and tree. While Re-pair is a linear-time algorithm, it is slower than other general compression algorithms in practice. This is an obstacle in applying Re-pair to large-scale data. In this paper, we present Parallel Re-pair, a practical implementation that enables parallel processing of Re-pair. In Parallel Re-pair, Re-pair is executed in multiple threads for the divided block. Each thread shares a dictionary and it can output a single CFG. This allows us to process the entire input text in a compressed state. We experimented with datasets of tens of gigabytes. Our experiments show that Parallel Re-Pair is 7.9 to 10.4 times faster than sequential ones on 32 processors.

up
0 users have voted: