Compression files

2014/10 – data structure

There is another interesting subject is the compression algorithm, using the Huffman algorithm to compress the files.

1. Making a frequency table, it is a 1028 bytes table, including the frequency of all the letters and symbols we load.

2.Making a leaf list, it is a table that only count the available nodes (letters), and the order is by the frequency .

3.Making the Huffman tree, this data structure will use the leaf list, that the nodes will become a tree structure, only the leaf nodes has available data, and the frequency of every sibling leaf is left less than right.

4.Once has the Huffman tress, we can make a encode table, by using this table, we can re-write the contents we load at first. Because right now every letter is not 1 byte anymore, it is 1 bits ~ 8 bits possible, by writing the 0 and 1 to the file, plus the frequency table for decode later, we can gradually reduce the file size. Although at smaller file, it will increase the file size by the bottle neck of adding the frequency table, but in larger file size, it is very useful.

Once we want to extract the file

1. Reading the frequency table first, so we have frequency table, hence has leaf list, so we has Huffman tree again.

2.Reading the contents by reading the 0 and 1, by using the Huffman tree, we could decode the content of the words, so we got the un-compress results.

 

Interesting topic of this month.

 

Leave a comment