Sorry, you need to enable JavaScript to visit this website.

DZip: improved general-purpose lossless compression based on novel neural network modeling

Primary tabs

Citation Author(s):
Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa
Submitted by:
Mohit Goyal
Last updated:
1 March 2021 - 10:16am
Document Type:
Video Presentation
Document Year:
2021
Event:
Presenters Name:
Mohit Goyal
Paper Code:
Paper ID 149
Categories:

Abstract 

up
0 users have voted:

Comments

Thanks for the presentation!

I'd like to ask whether you know where your computational bottleneck is.
Since you use the pytorch library written in python, do you think you can obtain a performance boost by switching to a different AI library?

Did you really measure *a few seconds* per MB for the gzip compression? In that case it would be interesting to know on which hardware you conducted the experiments. On common hardware storing the data on SSDs, gzip should process multiple MBs per second.

Hi Dominik,

Thanks for your interest in our work! The major bottleneck I suppose is the gradient back-propagation step for updating the parameters of the model followed by the forward pass through the model for computing the probabilities. I think we can definitely achieve some percentage of improvement in speed by switching to a different AI library. For example, NNCP (Bellard et al. 2020) implemented a similar algorithm on CPUs with a low-level optimization which is around 4x slower than DZip. A dedicated library for GPU based compression should definitely result in quite a bit of improvement in speed (not more than 10x faster I believe). Another possibility is to use different model architectures which are simpler or can be parallelized efficiently.

Regarding the speed for Gzip, the precise speed I remember was a few seconds per MB (might be faster on SSDs). Nevertheless, the number was intended to give a rough range for the compression speeds of the general class of traditional compression methods (gzip, BSC, 7zip and zpaq). For experimentation, we used HDDs (instead of SSDs) with Intel Xeon Gold 6146 CPUs. Moreover, we observed faster and better compression for BSC (takes roughly 0.1 seconds per MB) which is a better baseline among traditional methods. In any case, the message we wanted to deliver is that traditional methods are about 2-3 orders of magnitude faster than NN-based compressors.

Let me know if you have more questions.

Dataset Files

Presentation Slides

(91)