Sorry, you need to enable JavaScript to visit this website.

As today's scientific simulations with high-performance computing produce an extremely large amount of data, reliable data compression techniques are becoming important. In scientific applications, downstream quantities derived from the original or primary data (PD) are crucial for post-analysis, and therefore these Quantities of Interest (QoI) are required to be preserved during compression. While autoencoders (AEs) have recently been used for the compression of scientific datasets, there is a clear lack in the literature w.r.t.

Categories:
24 Views

MONI (Rossi et al., 2022) can store a pangenomic dataset T in small space and later, given a pattern P, quickly find the maximal exact matches (MEMs) of P with respect to T. In this paper we consider its one-pass version (Boucher et al., 2021), whose query times are dominated in our experiments by longest common extension (LCE) queries. We show how a small modification lets us avoid most of these queries and thus significantly speeds up MONI in practice while only slightly increasing its size.

Categories:
28 Views

Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array.

Categories:
74 Views

Algorithms for deriving Huffman codes and the recently developed algorithm for compiling PIFO trees to trees of fixed shape [1] are similar, but work with different underlying algebraic operations. In this paper, we exploit the monadic structure of prefix codes to create a generalized Huffman algorithm that has these two applications as special cases.

Categories:
14 Views

In this paper, we use the biological domain knowledge incorporated into stochastic models
for ab initio RNA secondary-structure prediction to improve the state of the art in joint
compression of RNA sequence and structure data (Liu et al., BMC Bioinformatics, 2008).
Moreover, we show that, conversely, compression ratio can serve as a cheap and robust
proxy for comparing the prediction quality of different stochastic models, which may help
guide the search for better RNA structure prediction models.

Categories:
29 Views

Linear computation coding is concerned with the compression of multidimensional linear functions, i.e. with reducing the computational effort of multiplying an arbitrary vector to an arbitrary, but known, constant matrix.
This paper advances over the state-of-the art, that is based on a discrete matching pursuit (DMP) algorithm, by a step-wise optimal search.
Offering significant performance gains over DMP, it is however computationally infeasible for large matrices and high accuracy.

Categories:
20 Views

Compressed sensing aims to retrieve sparse signals from very few samples. It relies on dedicated reconstruction algorithms and well-chosen measurement matrices. In combination with network coding, which operates traditionally over finite fields, it leverages the benefits of both techniques. However, compressed sensing has been primarily investigated over the real field. F2OMP is one of the few recovery algorithms to reconstruct signals over finite fields. However, its use in practical cases is limited since its performance depends mainly on binary matrices for signal recovery.

Categories:
25 Views

Pages