Sorry, you need to enable JavaScript to visit this website.

Applying Practical Parallel Grammar Compression to Large-scale Data

Error message

  • The specified file temporary://fileMoS4lh could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileLi7SGm could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileqJrc9N could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://filecb6F3A could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://filerPuxtI could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileNGAjC4 could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://filecoFPqR could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileQyt5mQ could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
  • The specified file temporary://fileMlNgrg could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.
Citation Author(s):
Masaski Matsushita, Yasushi Inoguchi
Submitted by:
Masaki Matsushita
Last updated:
4 March 2022 - 12:03pm
Document Type:
Poster
Document Year:
2022
Event:
Presenters:
Masaki Matsushita
Paper Code:
176
Categories:
 

Re-pair is a grammar-based compression algorithm. It achieves higher compression rates for text, graph, and tree. While Re-pair is a linear-time algorithm, it is slower than other general compression algorithms in practice. This is an obstacle in applying Re-pair to large-scale data. In this paper, we present Parallel Re-pair, a practical implementation that enables parallel processing of Re-pair. In Parallel Re-pair, Re-pair is executed in multiple threads for the divided block. Each thread shares a dictionary and it can output a single CFG. This allows us to process the entire input text in a compressed state. We experimented with datasets of tens of gigabytes. Our experiments show that Parallel Re-Pair is 7.9 to 10.4 times faster than sequential ones on 32 processors.

up
0 users have voted: