- Read more about Compressing Multisets with Large Alphabets
- Log in to post comments
Current methods which compress multisets at an optimal rate have computational complexity that scales linearly with alphabet size, making them too slow to be practical in many real-world settings. We show how to convert a compression algorithm for sequences into one for multisets, in exchange for an additional complexity term that is quasi-linear in sequence length. This allows us to compress multisets of independent and identically distributed symbols at an optimal rate, with computational complexity decoupled from the alphabet size.
- Categories:
- Read more about Computing Matching Statistics on Repetitive Texts
- 2 comments
- Log in to post comments
- Categories:
- Read more about Graphs can be succinctly indexed for pattern matching in O(E^2 + V^{2.5}) time
- Log in to post comments
For the first time we provide a \emph{succinct} pattern matching index for \emph{arbitrary} graphs that can be built \emph{in polynomial time}, while improving both space and query time bounds from [SODA 2021].
- Categories:
- Read more about Linear-time minimization of Wheeler DFAs
- 2 comments
- Log in to post comments
Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is playing an important role in the emerging field of \emph{compressed data structures}: as opposed to general automata, WDFAs can be stored in just $\log\sigma + O(1)$ bits per edge, $\sigma$ being the alphabet's size, and support optimal-time pattern matching queries on the substring closure of the language they recognize. An important step to achieve further compression is minimization.
- Categories:
- Read more about On different variants of the Burrows-Wheeler-Transform of string collections
- Log in to post comments
The extended Burrows-Wheeler-Transform (eBWT), introduced by Mantaci et al. [Theor. Comput. Sci., 2007], is a generalization of the Burrows-Wheeler-Transform (BWT) to multisets of strings. While the original BWT is based on the lexicographic order, the eBWT uses the omega-order, which differs from the lexicographic order in important ways. A number of tools are available that compute the BWT of string collections; however, the data structures they generate in most cases differ from the one originally defined, as well as from each other.
- Categories:
- Read more about SortComp (Sort-and-Compress) - Towards a Universal Lossless Compression Scheme for Matrix and Tabular Data
- 1 comment
- Log in to post comments
A universal scheme is proposed for the lossless compression of two-dimensional tables and matrices. Instead of standard row- or column-based compression, we propose to sort each column first and record both the sorted table and the corresponding permutation table of the sorting permutations. These two tables are then separately compressed. In this new scheme, both intra- and inter-column correlations can be efficiently captured, giving rise to improved compression ratio in particular when both column-wise and row-wise dependencies cooccur.
- Categories:
3-D point clouds rendering solid representations of scenes or objects often carry a tremendous amount of points, compulsorily requesting high-efficiency compression for storage and transmission. In this paper, we propose a novel p-Laplacian embedding graph dictionary learning algorithm for 3-D point cloud attribute compression. The proposed method integrates the underlying graph topology to the learned graph dictionary capitalizing on p-Laplacian eigenfunctions and leads to parsimonious representations of 3-D point clouds.
- Categories:
- Categories:
- Read more about Speeding up compact planar graphs by using shallower trees
- 3 comments
- Log in to post comments
- Categories: