- Read more about Fast and Compact Set Intersection through Recursive Universe Partitioning
- 4 comments
- Log in to post comments

We present a data structure that encodes a sorted integer sequence in small space allowing, at the same time, fast intersection operations. The data layout is carefully designed to exploit word-level parallelism and SIMD instructions, hence providing good practical performance. The core algorithmic idea is that of recursive partitioning the universe of representation: a markedly different paradigm than the widespread strategy of partitioning the sequence based on its length.

- Categories:

- Read more about Compressing and Randomly Accessing Sequences
- 1 comment
- Log in to post comments

In this paper we consider the problem of storing sequences of symbols in

a compressed format, while supporting random access to the symbols without

decompression. Although this is a well-studied problem when the data is

textual, the kind of sequences we look at are not textual, and we argue

that traditional compression methods used in the text algorithms community

(such as compressors targeting $k$-th order empirical entropy) do not

perform as well on these sequential data, and simpler methods such

- Categories:

- Read more about Decompressing Lempel-Ziv compressed text
- Log in to post comments

We consider the problem of decompressing the Lempel--Ziv 77 representation of a string $S$ of length $n$ using a working space as close as possible to the size $z$ of the input. The folklore solution for the problem runs in $O(n)$ time but requires random access to the whole decompressed text. Another folklore solution is to convert LZ77 into a grammar of size $O(z\log(n/z))$ and then stream $S$ in linear time. In this paper, we show that $O(n)$ time and $O(z)$ working space can be achieved for constant-size alphabets.

- Categories:

- Read more about Approximating Optimal Bidirectional Macro Schemes.
- 1 comment
- Log in to post comments

Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better compression on highly repetitive sequences. We consider the problem of approximating optimal bidirectional macro schemes. We describe a simulated annealing algorithm that usually converges quickly. Moreover, in some cases, we obtain bidirectional macro schemes that are provably a 2-approximation of the optimal.

- Categories:

- Read more about Grammar compression with probabilistic context-free grammar
- 2 comments
- Log in to post comments

We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string T has been compressed as a context-free grammar G in Chomsky normal form satisfying L(G) = T. Such a grammar is often called a straight-line program (SLP). In this work, we consider a probabilistic grammar G that generates T, but not necessarily as a unique element of L(G). In order to recover the original text T unambiguously, we keep both the grammar G and the derivation tree of T from the start symbol in G, in compressed form.

## dcc_poster.pdf

- Categories:

- Read more about Reverse Multi-Delimiter Compression Codes
- Log in to post comments

An enhanced version of a recently introduced family of variable length binary codes with multiple pattern delimiters is presented and discussed. These codes are complete, universal, synchronizable, they have monotonic indexing and allow a standard search in compressed ﬁles. Comparing the compression rate on natural language texts demonstrates that introduced codes appear to be much superior to other known codes with similar properties. A fast byte-aligned decoding algorithm is constructed, which operates much faster than the one for Fibonacci codes.

- Categories:

- Read more about An Accurate Evaluation of MSD Log-likelihood and its Application in Human Action Recognition
- Log in to post comments

In this paper, we examine the problem of modeling overdispersed frequency vectors that are naturally generated by several machine learning and computer vision applications.

- Categories:

- Read more about An Accurate Evaluation of MSD Log-likelihood and its Application in Human Action Recognition
- Log in to post comments

In this paper, we examine the problem of modeling overdispersed frequency vectors that are naturally generated by several machine learning and computer vision applications.

## MSD_Mesh.pdf

- Categories:

In this research, we aim to propose a data preprocessing framework particularly for financial sector to generate the rating data as input to the collaborative system. First, clustering technique is applied to cluster all users based on their demographic information which might be able to differentiate the customers’ background. Then, for each customer group, the importance of demographic characteristics which are highly associated with financial products purchasing are analyzed by the proposed fuzzy integral technique.

- Categories:

- Read more about GlobalSIP 2018 Keynote: Tensors and Probability: An Intriguing Union (N. Sidiropoulos, N. Kargas, X. Fu)
- Log in to post comments

We reveal an interesting link between tensors and multivariate statistics. The rank of a multivariate probability tensor can be interpreted as a nonlinear measure of statistical dependence of the associated random variables. Rank equals one when the random variables are independent, and complete statistical dependence corresponds to full rank; but we show that rank as low as two can already model strong statistical dependence.

- Categories: