A greedy algorithm always makes the choice that looks best at the moment. It makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution.
Greedy algorithms do not result in optimal solutions always but for many problems they do. A problem can be solved by Greedy Algorithm if it exhibits optimal substructure. A problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems. Optimal substructure is a necessary property of both Greedy and Dynamic Programming classes of problems.
But then when should we apply Dynamic Programming and when Greedy?
Turns out Dynamic Programming has another important condition: Overlapping Subproblems. It simply means that the space of subproblems must be “small” in the sense that a recursive algorithm for the problem solves the same subproblems over and over, rather than always generating new subproblems. When a problem doesn’t have overlapping subproblems, we might want to solve it by greedy technique.
Huffman code is used to compress the file. For example: Consider the word “Hello”.
The bit representation of “Hello is “: 01101000 01100101 01101100 01101100 01101111
It means, we will need 40 bits to store/transfer “hello”.
Huffman was a student at MIT when he discovered that its cheap to transfer/store when we already know the frequency of the characters in the given text and then use less bit to represent more common characters. In our case, “l” occurs twice while “e”, “h”, “o” occurs once in the given text. Frequency table would like below:
Now, we need to build Huffman tree using the frequency distribution provided.
How do we do that?
a. Make a node of each character and respective frequency. In the code shown below, TreeNode class is the node.
b. Store each node in the priority queue. Priority queue, because it will always give us nodes sorted by frequency. Precisely why TreeNode implements Comparable as it will give us a compareTo function which will be used by priority queue to give us nodes sorted on the frequency.
c. Now, take two smallest nodes, and make it left and right tree of a new TreeNode and again push it in priority queue. Do this, until just one element remains in Priority Queue.
d. Now, the only TreeNode remaining in Priority Queue is our Huffman Tree. Assign bits to each branch.
Huffman Tree looks like this now:
Here’s the java code to do the same.
Output of the code is:
Encoded String is 1011100110
You can see that ‘hello’ is represented as 01101000 01100101 01101100 01101100 01101111 which takes 40 bits. Now the same string is represented in 10 bits.
Another implementation of huffman can be found here.