

- Read more about Applying Practical Parallel Grammar Compression to Large-scale Data
- Log in to post comments
Re-pair is a grammar-based compression algorithm. It achieves higher compression rates for text, graph, and tree. While Re-pair is a linear-time algorithm, it is slower than other general compression algorithms in practice. This is an obstacle in applying Re-pair to large-scale data. In this paper, we present Parallel Re-pair, a practical implementation that enables parallel processing of Re-pair. In Parallel Re-pair, Re-pair is executed in multiple threads for the divided block. Each thread shares a dictionary and it can output a single CFG.
- Categories:

- Read more about Chunk Content is not Enough: Chunk-Context Aware Resemblance Detection for Deduplication Delta Compression
- Log in to post comments
With the growing popularity of cloud storage, identifying and removing duplicate data across users is getting more critical for service providers. Thus, many researchers have attracted attention for data resemblance to detect redundancy among similar data. It uses feature extraction to detect data chunks with high similarity first, and then treat them as candidates for removing redundancy.
- Categories:

- Read more about HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances
- Log in to post comments
We propose a new representation of the offsets of the Lempel-Ziv (LZ) factorization
based on the co-lexicographic order of the text's prefixes.
The selected offsets tend to approach the k-th order empirical entropy.
Our evaluations show that this choice is superior to
the rightmost and bit-optimal LZ parsings on datasets with small high-order entropy.
- Categories:

- Read more about Computing Lexicographic Parsings
- 2 comments
- Log in to post comments
We give memory-friendly algorithms computing the compression schemes lexparse in linear time.
- Categories:

- Read more about PHONI: Streamed Matching Statistics with Multi-Genome References
- Log in to post comments
Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly.
dcc21phoni.s.pdf

- Categories:

- Read more about A Disk-Based Index for Trajectories with an In-Memory Compressed Cache
- Log in to post comments
We present a representation of trajectories moving through the space without any constraint. It combines an in-memory cached index based on compact data structures and a classic disk-based strategy. The first structure allows some loss of precision that is refined with the second component. This approach reduces the number of accesses to disk. Comparing it with a classical index like the MVR-tree, this structure obtains competitive times in queries like time slice and knn, and sharply outperforms it in time interval queries.
dcc2021.pdf

- Categories:

- Read more about Adaptive Stream-based Entropy Coding
- Log in to post comments
Fast data streams are applied by various applications such as multimedia, communication and sensory devices. The amount of data is getting larger and the transfer speed is also getting higher. To address this kind of fast applications, a high performance stream-based data compression mechanism is demanded.
- Categories:

- Read more about Compact Representation of Graphs with Small Bandwidth and Treedepth
- Log in to post comments
We consider the problem of compact representation of graphs with small bandwidth as well as graphs with small treedepth. These parameters capture structural properties of graphs that come in useful in certain applications.
- Categories:

- Read more about Compressed Quadratization of Higher Order Binary Optimization Problems
- Log in to post comments
Recent hardware advances in quantum and quantum-inspired annealers promise substantial speedup for solving NP-hard combinatorial optimization problems compared to general-purpose computers. These special-purpose hardware are built for solving hard instances of Quadratic Unconstrained Binary Optimization (QUBO) problems. In terms of number of variables and precision of these hardware are usually resource-constrained and they work either in Ising space $\ising$ or in Boolean space $\bool$. Many naturally occurring problem instances are higher-order in nature.
- Categories: