RLBWT Tricks

RLBWT_Tricks_Slides__DCC_22__.pdf

Thumbs Up

Comments

Evaluation

Permalink Submitted by Dominik Koeppl on 6 March 2022 - 4:43am

It is interesting to see that RLCSA is faster than RLBWT when considering only count queries. I thought that they should be basically the same, even in terms of space if we omit the suffix array samplings, which are needed for locate, but not for the counting queries you measured. It would be great if you can give any insights about the differences of the measured implementations.

Also, in the last plot, I do not think you introduced the solutions 'sparse bv' and 'wt_fbb'. I think that these two are different flavors in how to represent the LF-mapping with the RLBWT, probably by using a wavelet tree?

Re: Evaluation

Permalink Submitted by Nathaniel Brown on 21 March 2022 - 10:00pm

Hello Dominik,

I've updated the slides (but not the video, it was already a bit short of time) to include details of the other methods in the plot. Indeed, 'sparse bv' is the RLBWT component of the r-index using compressed sparse bitvectors (https://github.com/nicolaprezza/r-index), and wt_fbb is a wavelet tree using fixed block boosting to represent the RLBWT (https://github.com/dominikkempa/faster-minuter). Of course, these are expanded on in the arxived paper, and a v2 will be released shortly which also includes further results comparing these methods.

Regarding configurations, I made sure to verify after your comment that we do indeed omit suffix array sampling and measure only the BWT component of RLCSA, and the r-index component maintains default parameters with respect to marking bits of the compressed sparse bitvectors. It is an interesting question of why RLCSA might defy expectation, so I'll be sure to expand further on the differences of the measured implementations in the updated version as well, which might allow others to speculate on why this is the case.

CITE

Documents

Poster

RLBWT_Tricks_Slides__DCC_22__.pdf

Comments

Evaluation

Re: Evaluation

QUESTIONS?