Thanks for the presentation!
I'd like to ask whether you know where your computational bottleneck is.
Since you use the pytorch library written in python, do you think you can obtain a performance boost by switching to a different AI library?
Did you really measure *a few seconds* per MB for the gzip compression? In that case it would be interesting to know on which hardware you conducted the experiments. On common hardware storing the data on SSDs, gzip should process multiple MBs per second.
Thanks for your interest in our work! The major bottleneck I suppose is the gradient back-propagation step for updating the parameters of the model followed by the forward pass through the model for computing the probabilities. I think we can definitely achieve some percentage of improvement in speed by switching to a different AI library. For example, NNCP (Bellard et al. 2020) implemented a similar algorithm on CPUs with a low-level optimization which is around 4x slower than DZip. A dedicated library for GPU based compression should definitely result in quite a bit of improvement in speed (not more than 10x faster I believe). Another possibility is to use different model architectures which are simpler or can be parallelized efficiently.
Regarding the speed for Gzip, the precise speed I remember was a few seconds per MB (might be faster on SSDs). Nevertheless, the number was intended to give a rough range for the compression speeds of the general class of traditional compression methods (gzip, BSC, 7zip and zpaq). For experimentation, we used HDDs (instead of SSDs) with Intel Xeon Gold 6146 CPUs. Moreover, we observed faster and better compression for BSC (takes roughly 0.1 seconds per MB) which is a better baseline among traditional methods. In any case, the message we wanted to deliver is that traditional methods are about 2-3 orders of magnitude faster than NN-based compressors.
Let me know if you have more questions.
Copy and paste the HTML code below to embed your dataset:
Click the buttons below: