It is interesting to see that RLCSA is faster than RLBWT when considering only count queries. I thought that they should be basically the same, even in terms of space if we omit the suffix array samplings, which are needed for locate, but not for the counting queries you measured. It would be great if you can give any insights about the differences of the measured implementations.
Also, in the last plot, I do not think you introduced the solutions 'sparse bv' and 'wt_fbb'. I think that these two are different flavors in how to represent the LF-mapping with the RLBWT, probably by using a wavelet tree?
I've updated the slides (but not the video, it was already a bit short of time) to include details of the other methods in the plot. Indeed, 'sparse bv' is the RLBWT component of the r-index using compressed sparse bitvectors (https://github.com/nicolaprezza/r-index), and wt_fbb is a wavelet tree using fixed block boosting to represent the RLBWT (https://github.com/dominikkempa/faster-minuter). Of course, these are expanded on in the arxived paper, and a v2 will be released shortly which also includes further results comparing these methods.
Regarding configurations, I made sure to verify after your comment that we do indeed omit suffix array sampling and measure only the BWT component of RLCSA, and the r-index component maintains default parameters with respect to marking bits of the compressed sparse bitvectors. It is an interesting question of why RLCSA might defy expectation, so I'll be sure to expand further on the differences of the measured implementations in the updated version as well, which might allow others to speculate on why this is the case.
TY - DATA
T1 - RLBWT Tricks
AU - Nathaniel K. Brown; Travis Gagie; Massimiliano Rossi
PY - 2022
PB - IEEE Signal Processing Society SigPort
UR - https://sigport.org/documents/rlbwt-tricks
ER -
1. Nathaniel K. Brown, Travis Gagie, Massimiliano Rossi.
RLBWT Tricks [Internet].
IEEE Signal Processing Society SigPort; 2022.
Available from :
https://sigport.org/documents/rlbwt-tricks
Comments
Evaluation
It is interesting to see that RLCSA is faster than RLBWT when considering only count queries. I thought that they should be basically the same, even in terms of space if we omit the suffix array samplings, which are needed for locate, but not for the counting queries you measured. It would be great if you can give any insights about the differences of the measured implementations.
Also, in the last plot, I do not think you introduced the solutions 'sparse bv' and 'wt_fbb'. I think that these two are different flavors in how to represent the LF-mapping with the RLBWT, probably by using a wavelet tree?
Re: Evaluation
Hello Dominik,
I've updated the slides (but not the video, it was already a bit short of time) to include details of the other methods in the plot. Indeed, 'sparse bv' is the RLBWT component of the r-index using compressed sparse bitvectors (https://github.com/nicolaprezza/r-index), and wt_fbb is a wavelet tree using fixed block boosting to represent the RLBWT (https://github.com/dominikkempa/faster-minuter). Of course, these are expanded on in the arxived paper, and a v2 will be released shortly which also includes further results comparing these methods.
Regarding configurations, I made sure to verify after your comment that we do indeed omit suffix array sampling and measure only the BWT component of RLCSA, and the r-index component maintains default parameters with respect to marking bits of the compressed sparse bitvectors. It is an interesting question of why RLCSA might defy expectation, so I'll be sure to expand further on the differences of the measured implementations in the updated version as well, which might allow others to speculate on why this is the case.