Lean Implementation for Huffmann Coding Algorithm, as a course Project for IISc, Proof and Programs 2025.
The algorithm is based on the Huffman coding algorithm, which is a greedy algorithm used for lossless data compression. The algorithm works by assigning variable-length codes to input characters, with shorter codes assigned to more frequent characters. The steps of the algorithm are as follows:
- Count the frequency of each character in the input string.
- Create a priority queue (or min-heap) of nodes, where each node represents a character and its frequency.
- While there is more than one node in the queue:
- Remove the two nodes with the lowest frequency from the queue.
- Create a new internal node with these two nodes as children and a frequency equal to the sum of their frequencies.
- Add the new node back to the queue.
- The remaining node in the queue is the root of the Huffman tree.
- Traverse the tree to assign binary codes to each character, where left edges represent 0 and right edges represent 1.
- Encode the input string using the assigned codes.
- (Optionally) Decode the encoded string using the Huffman tree.
The implementation is done in Lean4, and this repository contains the following -
- Implementation of the Huffman coding algorithm and binary tree, which we call
HfmnTree
. - Proof of Correctness of the algorithm, which includes:
- Proving the Optimality of the Algorithm.
Thanks to Professor Siddhartha Gadgil, IISc for guidance throughout the project.