Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Bitvectors with runs and the successor/predecessor problem

Abstract: 

The successor and predecessor problem consists of obtaining the closest value in a set of integers, greater or smaller than a given value. This problem has interesting applications, like the intersection of inverted lists. It can be easily modeled by using a bitvector of size n and its operations rank and select. However, there is a practical approach [1], which keeps the best theoretical bounds, and allows to solve successor and predecessor more efficiently. Based on that technique, we designed a novel compact data structure for bitvectors with k runs that achieves access, rank, and successor/predecessor in O(1) time by consuming space O(sqrt(kn)) bits. In practice, it obtains a compression ratio of 0.04%-26.33% when the runs are larger than 100, and becomes the fastest technique, which considers compressibility, in successor/predecessor queries. Besides, we present a recursive variant of our structure, which tends to O(k) bits and takes O(log (n/k)) time.

up
1 user has voted: Dominik Koeppl

Comments

Thank's for your well-presented talk! Having watched it, I'd like to ask the following two questions:

Slide 8: $k$ is the number of runs in your bit vector, $m$ the minimum min(#1, #0) of the number of ones or the number of zeros, but what is $b$ (in space for hybrid)? As far as I understood, hybrid selects between the oz-vector and the sd-array representation?

Slide 11: Each block $X_i$ of the partitioning has a fixed length $\beta$.
Does it make sense to maximize the Z and O type blocks and use a (hopefully very sparse and hence compressible) bit vector marking the beginning positions of the $X_i$'s?

Hi Dominik! Thanks for your questions. I hope you have enjoyed the talk!

With respect to the first question, hybrid splits the bitvector into blocks of a fixed size $b$. The hybrid selects between three kind of representations i) sparse, when the number of $m$ is quite small, ii) runs, there are different areas with the same consecutive values, and iii) a plain representation. Sparse blocks are represented by storing the location of the 1-bit or 0-bit within the block and blocks with runs store the lengths of each run. Different compressions for those parts could be applied, like sd-array or oz-vector, but they can suppose slower response times.

The second question is quite interesting, I think it is another possibility , but I was looking for a way to solve successor/predecessor in O(1) time. By adding that sparse bitmap, I think the constant time is lost because of the rank in sparse bitvectors. However, it is a good option if you look for a better compression ratio.

If you have any doubt or need some additional information, do not hesitate to contact me.

Paper Details

Authors:
Submitted On:
20 March 2020 - 3:47pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Adrian Gomez Brandon
Session:
Session 4

Document Files

zombit-vector-presentation.pdf

(90)

Keywords

Additional Categories

Subscribe

[1] , "Bitvectors with runs and the successor/predecessor problem", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5009. Accessed: Sep. 26, 2020.
@article{5009-20,
url = {http://sigport.org/5009},
author = { },
publisher = {IEEE SigPort},
title = {Bitvectors with runs and the successor/predecessor problem},
year = {2020} }
TY - EJOUR
T1 - Bitvectors with runs and the successor/predecessor problem
AU -
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5009
ER -
. (2020). Bitvectors with runs and the successor/predecessor problem. IEEE SigPort. http://sigport.org/5009
, 2020. Bitvectors with runs and the successor/predecessor problem. Available at: http://sigport.org/5009.
. (2020). "Bitvectors with runs and the successor/predecessor problem." Web.
1. . Bitvectors with runs and the successor/predecessor problem [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5009