- Read more about Graphs can be succinctly indexed for pattern matching in O(E^2 + V^{2.5}) time
- Log in to post comments
For the first time we provide a \emph{succinct} pattern matching index for \emph{arbitrary} graphs that can be built \emph{in polynomial time}, while improving both space and query time bounds from [SODA 2021].
- Categories:
- Read more about Linear-time minimization of Wheeler DFAs
- 2 comments
- Log in to post comments
Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is playing an important role in the emerging field of \emph{compressed data structures}: as opposed to general automata, WDFAs can be stored in just $\log\sigma + O(1)$ bits per edge, $\sigma$ being the alphabet's size, and support optimal-time pattern matching queries on the substring closure of the language they recognize. An important step to achieve further compression is minimization.
- Categories:

- Read more about SortComp (Sort-and-Compress) - Towards a Universal Lossless Compression Scheme for Matrix and Tabular Data
- 1 comment
- Log in to post comments
A universal scheme is proposed for the lossless compression of two-dimensional tables and matrices. Instead of standard row- or column-based compression, we propose to sort each column first and record both the sorted table and the corresponding permutation table of the sorting permutations. These two tables are then separately compressed. In this new scheme, both intra- and inter-column correlations can be efficiently captured, giving rise to improved compression ratio in particular when both column-wise and row-wise dependencies cooccur.
- Categories:

- Read more about Converting RLBWT to LZ77 in smaller space
- 1 comment
- Log in to post comments
- Categories:

- Read more about Compact Representation of Spatial Hierarchies and Topological Relationships
- Log in to post comments
The topological model for spatial objects identifies common boundaries between regions, explicitly storing adjacency relations, which not only improves the efficiency of topology-related queries, but also provides advantages such as avoiding data duplication and facilitating data consistency. Recently, a compact representation of the topological model based on planar graph embeddings was proposed.
- Categories:

- Read more about On Random Editing in LZ-End
- 1 comment
- Log in to post comments
LZ-End is a variant of the LZ77 compression algorithm which allows random access to the compressed data. In this paper, we show how the random-access capability of LZ-End allows random edits to the compressed data, which is the first algorithm to randomly edit strings compressed by a Lempel-Ziv algorithm.
- Categories:

- Read more about Approximating Optimal Bidirectional Macro Schemes.
- 1 comment
- Log in to post comments
Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better compression on highly repetitive sequences. We consider the problem of approximating optimal bidirectional macro schemes. We describe a simulated annealing algorithm that usually converges quickly. Moreover, in some cases, we obtain bidirectional macro schemes that are provably a 2-approximation of the optimal.
- Categories:

- Read more about Grammar compression with probabilistic context-free grammar
- 2 comments
- Log in to post comments
We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string T has been compressed as a context-free grammar G in Chomsky normal form satisfying L(G) = T. Such a grammar is often called a straight-line program (SLP). In this work, we consider a probabilistic grammar G that generates T, but not necessarily as a unique element of L(G). In order to recover the original text T unambiguously, we keep both the grammar G and the derivation tree of T from the start symbol in G, in compressed form.
dcc_poster.pdf

- Categories:

- Read more about Towards Better Compressed Representations
- Log in to post comments
We introduce the problem of computing a parsing where each phrase is of length at most m and which minimizes the zeroth order entropy of parsing. Based on the recent theoretical results we devise a heuristic for this problem. The solution has straightforward application in succinct text representations and gives practical improvements. Moreover the proposed heuristic yields structure which size can be bounded both by |S|H_{m-1}(S) and by |S|/m(H_0(S) + ... + H_{m-1}(S)),where H_{k}(S) is the k-th order empirical entropy of S.
poster_dcc.pdf

- Categories:

- Read more about Edge minimization in de Bruijn graphs
- 1 comment
- Log in to post comments
This paper introduces the de Bruijn graph edge minimization problem, which is related to the compression of de Bruijn graphs: find the order-k de Bruijn graph with minimum edge count among all orders. We describe an efficient algorithm that solves this problem. Since the edge minimization problem is connected to the BWT compression technique called "tunneling", the paper also describes a way to minimize the length of a tunneled BWT in such a way that useful properties for sequence analysis are preserved.
- Categories: