Sorry, you need to enable JavaScript to visit this website.

The extended Burrows-Wheeler-Transform (eBWT), introduced by Mantaci et al. [Theor. Comput. Sci., 2007], is a generalization of the Burrows-Wheeler-Transform (BWT) to multisets of strings. While the original BWT is based on the lexicographic order, the eBWT uses the omega-order, which differs from the lexicographic order in important ways. A number of tools are available that compute the BWT of string collections; however, the data structures they generate in most cases differ from the one originally defined, as well as from each other.

Categories:
82 Views

A universal scheme is proposed for the lossless compression of two-dimensional tables and matrices. Instead of standard row- or column-based compression, we propose to sort each column first and record both the sorted table and the corresponding permutation table of the sorting permutations. These two tables are then separately compressed. In this new scheme, both intra- and inter-column correlations can be efficiently captured, giving rise to improved compression ratio in particular when both column-wise and row-wise dependencies cooccur.

Categories:
162 Views

3-D point clouds rendering solid representations of scenes or objects often carry a tremendous amount of points, compulsorily requesting high-efficiency compression for storage and transmission. In this paper, we propose a novel p-Laplacian embedding graph dictionary learning algorithm for 3-D point cloud attribute compression. The proposed method integrates the underlying graph topology to the learned graph dictionary capitalizing on p-Laplacian eigenfunctions and leads to parsimonious representations of 3-D point clouds.

Categories:
87 Views

We propose a new structured pruning framework for compressing Deep Neural Networks
(DNNs) with skip-connections, based on measuring the statistical dependency of hidden
layers and predicted outputs. The dependence measure defined by the energy statistics of
hidden layers serves as a model-free measure of information between the feature maps and
the output of the network. The estimated dependence measure is subsequently used to
prune a collection of redundant and uninformative layers. Model-freeness of our measure

Categories:
7 Views

Genomic sequencing data contain three different data fields: read names, quality values, and nucleotide sequences. In this work, a variety of entropy encoders and compression algorithms were benchmarked in terms of compression-decompression rates and times separately for each data field as raw data from FASTQ files (implemented in the Fastq analysis script) and in MPEG-G uncompressed descriptor symbols decoded from MPEG-G bitstreams (implemented in the symbols analysis script).

Categories:
131 Views

With the widespread application of next generation sequencing technologies, the volume of sequencing data became comparable to that of big data domains. The compression of sequencing reads (nucleotide sequences, quality values, read names), in both raw and aligned data, is a way to alleviate bandwidth, transfer, and storage requirements of genomics pipelines. ISO/IEC MPEG-G standardizes the compressed representation (i.e. storage and streaming) of structured, indexed sets of genomic sequencing data for both raw and aligned data.

Categories:
42 Views

Pages