Documents
Presentation Slides
Presentation Slides
Converting RLBWT to LZ77 in smaller space
- Citation Author(s):
- Submitted by:
- Tomohiro I
- Last updated:
- 1 March 2022 - 12:30pm
- Document Type:
- Presentation Slides
- Event:
- Categories:
- Keywords:
Comments
About parallelism
If I understood correctly, the basic idea is to compute the lengths of the factors without their references, for which it is not necessary to have the suffix array entries available. Instead, a bit vector with dynamic flipping operation, augmented with a rank support data structure is used to determine the factor lengths. Subsequently, we only need to compute those SA values that would have been requested, which we subsequently do in a second pass.
Although your experiments show, for each number of threads, a time measure for the first pass, the first pass is purely sequential, so no matter how many threads you use, the time should stay roughly the same? Regarding parallelism, I am thinking of a heuristic that first partitions the text into blocks of equal length and then runs the factorization for each part; this should also give a valid LZ77-like parsing that could be computed completely in parallel. Also, maybe we could reconstruct from that the correct LZ77 parsing, but I think that you need to compute the parsing from scratch for certain bad examples - nevertheless it would be an interesting heuristic to try out on real data.
I think that your algorithm also works when you do not include the extra characters in your phrases such that you can represent them in the LZSS-fashioned pairs of lengths and positions (π_h, λ_h)?