Sorry, you need to enable JavaScript to visit this website.

Converting RLBWT to LZ77 in smaller space

Citation Author(s):
Masaki Shigekuni, Tomohiro I
Submitted by:
Tomohiro I
Last updated:
1 March 2022 - 12:30pm
Document Type:
Presentation Slides
Event:
Categories:
Keywords:
 
up
1 user has voted: Dominik Koeppl

Comments

If I understood correctly, the basic idea is to compute the lengths of the factors without their references, for which it is not necessary to have the suffix array entries available. Instead, a bit vector with dynamic flipping operation, augmented with a rank support data structure is used to determine the factor lengths. Subsequently, we only need to compute those SA values that would have been requested, which we subsequently do in a second pass.
Although your experiments show, for each number of threads, a time measure for the first pass, the first pass is purely sequential, so no matter how many threads you use, the time should stay roughly the same? Regarding parallelism, I am thinking of a heuristic that first partitions the text into blocks of equal length and then runs the factorization for each part; this should also give a valid LZ77-like parsing that could be computed completely in parallel. Also, maybe we could reconstruct from that the correct LZ77 parsing, but I think that you need to compute the parsing from scratch for certain bad examples - nevertheless it would be an interesting heuristic to try out on real data.
I think that your algorithm also works when you do not include the extra characters in your phrases such that you can represent them in the LZSS-fashioned pairs of lengths and positions (π_h, λ_h)?