Documents
Poster
Applying Practical Parallel Grammar Compression to Large-scale Data
- Citation Author(s):
- Submitted by:
- Masaki Matsushita
- Last updated:
- 4 March 2022 - 12:03pm
- Document Type:
- Poster
- Document Year:
- 2022
- Event:
- Presenters:
- Masaki Matsushita
- Paper Code:
- 176
- Categories:
- Log in to post comments
Re-pair is a grammar-based compression algorithm. It achieves higher compression rates for text, graph, and tree. While Re-pair is a linear-time algorithm, it is slower than other general compression algorithms in practice. This is an obstacle in applying Re-pair to large-scale data. In this paper, we present Parallel Re-pair, a practical implementation that enables parallel processing of Re-pair. In Parallel Re-pair, Re-pair is executed in multiple threads for the divided block. Each thread shares a dictionary and it can output a single CFG. This allows us to process the entire input text in a compressed state. We experimented with datasets of tens of gigabytes. Our experiments show that Parallel Re-Pair is 7.9 to 10.4 times faster than sequential ones on 32 processors.