Luna Ergonomics has been working on the linguistic domain for quite some time. To that end, we are releasing the Static Huffman Codes for Indian languages here.
As this is a work of Luna Ergonomics, please reproduce this with permission.
Know more about Huffman Encoding.
Short Messages may be compressed using compression algorithms to increase the efficiency of transmission of the messages. There are several mechanisms to achieve compression. Huffman Encoding is one such compression technique. There are two types of Huffman Encoding - Dynamic and Static.
Dynamic Huffman coding is an entropy encoding algorithm used for loss-less data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol.
Variable length table (Huffman Tree) is generated dynamically based on the frequency of symbols in the text. It provides a good compression for big files. However, we need to send the variable length Table (Huffman Tree) along with the encoded data. This information is required during de-compression. The size of variable length Table is dependent on the no of unique symbols in the text. It has been observed that the compressed data size is larger than the actual data size. This nullifies the objective of having compression.
Static Huffman encoding is a variant of Huffman encoding in which variable length table (Static Huffman Table) is created offline and will be available for compression and de-compressing module. The compression is dependent on the frequency of symbols.
It provides a good compression provided that Static Huffman table is properly generated. Compression is dependent on the frequency of symbols.
Generation of static Huffman table is a difficult task as it requires a thorough knowledge of language and frequency of word used in the language. For each language we have to generate a different static Huffman table. If a mobile device has to support 10 Languages then it has to store the entire 10 language static Huffman table. This adds a lot of memory requirement to small embedded devices. Based on discussion with many solution providers, it is observed that static Huffman compression technique is part of their proprietary solutions and have seen significant improvement with enhanced capacity increase of more than 200 character's per message. This solution is recommended as a good compression technique.

