Documents
Poster
Weighted Adaptive Huffman Coding
- Citation Author(s):
- Submitted by:
- Dana Shapira
- Last updated:
- 24 April 2020 - 2:45pm
- Document Type:
- Poster
- Document Year:
- 2020
- Event:
- Categories:
- Keywords:
- Log in to post comments
Huffman coding is known to be optimal in case the alphabet is known in advance, the set of codewords is fixed and each codeword consists of an integral number of bits. If one of these conditions is violated, optimality is not guaranteed.
In the {\it dynamic\/} variant of Huffman coding the encoder and decoder maintain identical copies of the model; at each position, the model consists of the frequencies of the elements processed so far.
After each processed element $\sigma$, the model is updated by incrementing the frequency of $\sigma$ by 1, while the other frequencies remain the same.
An enhanced dynamic Huffman coding named {\it forward looking coding} \cite{KSS} starts with the full frequencies, similar to the static variant, and then decreases them progressively. For this method, after each processed element $\sigma$, the model is altered by {\it decrementing\/} the frequency of $\sigma$ by 1, while the other frequencies remain the same.
Forward looking Huffman coding has been shown to be always better by at least $m-1$ bits than static Huffman coding.
A hybrid method, exploiting both the classical backward and the new forward approaches is proposed in \cite{FKS}, and
has been shown to be always at least as good as the forward looking Huffman coding.
If the model is learned adaptively, as in the traditional backward looking codings, no description of the model is needed, since the model is updated by the encoder and the decoder in synchronization.
However, in the other mentioned versions, the details of the chosen model on which the method relies,
are needed for the decoding and
should be adjoined to the compressed file, for example, as an header.
The contribution of this work is as follows:
we first define a new generic coding method which we call {\it weighted\/} coding, encompassing all mentioned variants (static, forward and backward) as special cases.
Second, a new special case called {\it positional\/} is suggested, and shown to be always at least as good as the forward looking coding.
Third, we present empirical results that show practical improvements of the proposed method, even when the encoded file includes the model description.
It is important to stress that all the methods can in fact be applied to every adaptive coding technique, in particular to arithmetic coding or PPM.
Comments
While reading your poster, I
While reading your poster, I've collected some questions:
What is $m$ in the introduction?
The plot is based on what data set? Does the relation to the static Huffman change for different input?
The most interesting question is what data needs to be stored in the header such that the decoder can restore the original data.
For $k = 0$, the forward Huffman, it is clear that you only need the frequencies of all characters. But for $k > 0$, you need the information
when to expect the next character $c$ for every $c$ of your alphabet. Storing this naively seems to make the header quite large.