GENEFORMER: LEARNED GENE COMPRESSION USING TRANSFORMER-BASED CONTEXT MODELING

The development of gene sequencing technology sparks an explosive growth of gene data. Thus, the storage of gene data has become an important issue. Recently, researchers begin to investigate deep learning-based gene data compression, which outperforms general traditional methods. In this paper, we propose a transformer-based gene compression method named GeneFormer. Specifically, we first introduce a modified transformer encoder with latent array to eliminate the dependency of the nucleotide sequence. Then, we design a multi-level-grouping method to accelerate and improve the compression process. Experimental results on real-world datasets show that our method achieves significantly better compression ratio compared with state-of-the-art method, and the decoding speed is significantly faster than all existing learning-based gene compression methods.

Documents

Poster

GENEFORMER: LEARNED GENE COMPRESSION USING TRANSFORMER-BASED CONTEXT MODELING

gene_poster.pdf

QUESTIONS?