On dynamic bitvector implementations
- Citation Author(s):
- Submitted by:
- Saska Doenges
- Last updated:
- 3 March 2022 - 8:11am
- Document Type:
- Presentation Slides
- Document Year:
- Saska Dönges
- Paper Code:
- Log in to post comments
Bitvectors that support rank and select queries are the workhorses of succinct data structures, implementations of which are now widespread, for example, in bioinformatics software. To date, however, most bitvector implementations are static, thus forcing more complex data structures built from them to be static too. In this paper we explore dynamic bitvectors, which, in addition to rank and select queries, also support update operations, specifically: insert, remove, and modify. We first provide several practical optimizations to the recent B-tree based bitvectors of Prezza (Proc. SEA 2017), including the use of buffers at leaves to speed update operations at the cost of a small overhead to query times. We then consider a common use case of succinct data structures, where queries and updates come in separate batches, and examine the efficacy of query support data structures that are fast to construct and speed rank and select queries, but become out of date when update operations are made. Finally, we explore several methods for leaf compression.
Some stats about your experimental setting
Your `find` function on Slide 9 looks pretty intriguing.
I would welcome any reference that helps me to understand why it works in detail.
Does it only work with a branching factor fixed at compile time?
In your shown evaluation on Slide 12, I wonder what was the chosen buffer size precisely - a 32-bit integer per leaf node? How many elements can a leaf store maximally in your experiments?
Have you conducted an experiment with different buffer sizes?
On Slide 17, this vertical dot-dashed green line looks strange to me. So you gain at roughly 10^7 bits a tremendous performance boost - is there an explanation for that?
Do you think that AVX-512 would allow for larger leaf and buffer sizes while retaining or even improving the speed?
Hi. Thanks for your questions
Hi. Thanks for your questions.
The branchless binary search implementation only works with a power of 2 branching factor set at compile time. And it does "sacrifice" the sign bit. The code on the slide is slightly abbreviated for clarity and space savings. The actual implementation can be seen here: https://github.com/saskeli/bit_vector/blob/main/bit_vector/internal/bran... .
The results on slide 12 are based on a buffer size of 8. So a total of 8 * 32 bits are allocated for buffering in each leaf. The maximum number of elements per leaf in that experiment is 2^14. I clearly forgot to add those number in the slides. I have run fairly large grid searches to determine good branching factors, leaf sizes and buffer sizes. Buffer sizes of 8 to 16 work well but have a performance trade-off, smaller or larger buffer sizes don't seem to be very useful in comparison. A leaf size of 2^14 works well (not significantly slower to update than smaller leaf sizes). Larger leaf sizes slow down updates, while smaller sizes slow down queries slightly. Similar story for branching factors, 64 elements are quick to update and fast to search. Larger branching factors get prohibitively expensive to maintain during updates.
The strange behavior visible on slide 17 is due to the RLE-leaves being slightly too resistant to splitting at small data structure sizes, leading to abysmal performance for RLE-leaves containing "high complexity" parts of the bit vector. This gets fixed immediately as the split that separates the high complexity area into a new uncompressed leaf bit vector happens. We plan to fix this by limiting the number of runs a RLE-leaf can contain instead of just limiting the number of space the leaf can occupy. Note that the test case is engineered for testing RLE-leaves, so good performance should be expected.
I don't exactly know what to expect of AVX-512 except that the inclusion of native wide-word population counts should speed up rank queries compared to the current AVX2 implementation. The overhead of initializing 512 bit pipelines may be a problem, and updates don't benefit from AVX-512. I want to try doing a "binary search:ish" implementation for calculating select at a leaf level, that could benefit from AVX-512 (not usable with AVX2), to see if select for buffered leaves could be sped up slightly.