Bitvectors with runs and the successor/predecessor problem

Primary tabs

Citation Author(s):
Submitted by:
Last updated:
20 March 2020 - 3:47pm
Document Type:
Presentation Slides
Document Year:
Event:
Presenters Name:

Abstract

Abstract:

The successor and predecessor problem consists of obtaining the closest value in a set of integers, greater or smaller than a given value. This problem has interesting applications, like the intersection of inverted lists. It can be easily modeled by using a bitvector of size n and its operations rank and select. However, there is a practical approach [1], which keeps the best theoretical bounds, and allows to solve successor and predecessor more efficiently. Based on that technique, we designed a novel compact data structure for bitvectors with k runs that achieves access, rank, and successor/predecessor in O(1) time by consuming space O(sqrt(kn)) bits. In practice, it obtains a compression ratio of 0.04%-26.33% when the runs are larger than 100, and becomes the fastest technique, which considers compressibility, in successor/predecessor queries. Besides, we present a recursive variant of our structure, which tends to O(k) bits and takes O(log (n/k)) time.

zombit-vector-presentation.pdf

Thank's for your well-presented talk! Having watched it, I'd like to ask the following two questions:

Slide 8: $k$ is the number of runs in your bit vector, $m$ the minimum min(#1, #0) of the number of ones or the number of zeros, but what is $b$ (in space for hybrid)? As far as I understood, hybrid selects between the oz-vector and the sd-array representation?

Slide 11: Each block $X_i$ of the partitioning has a fixed length $\beta$.
Does it make sense to maximize the Z and O type blocks and use a (hopefully very sparse and hence compressible) bit vector marking the beginning positions of the $X_i$'s?

Hi Dominik! Thanks for your

Hi Dominik! Thanks for your questions. I hope you have enjoyed the talk!

With respect to the first question, hybrid splits the bitvector into blocks of a fixed size $b$. The hybrid selects between three kind of representations i) sparse, when the number of $m$ is quite small, ii) runs, there are different areas with the same consecutive values, and iii) a plain representation. Sparse blocks are represented by storing the location of the 1-bit or 0-bit within the block and blocks with runs store the lengths of each run. Different compressions for those parts could be applied, like sd-array or oz-vector, but they can suppose slower response times.

The second question is quite interesting, I think it is another possibility , but I was looking for a way to solve successor/predecessor in O(1) time. By adding that sparse bitmap, I think the constant time is lost because of the rank in sparse bitvectors. However, it is a good option if you look for a better compression ratio.

If you have any doubt or need some additional information, do not hesitate to contact me.

Dataset Files

zombit-vector-presentation.pdf

(147)