Huffman coding is a widely used technique for data compression, and it has many applications in areas such as data storage, communication, and multimedia. The basic idea behind Huffman coding is to use shorter codes for more frequently occurring characters and longer codes for less frequently occurring characters. This approach reduces the overall size of the message without losing any information.
Objective
The Huffman coding problem involves the following steps:
- Count the frequency of each character in the input message.
- Create a priority queue of nodes, each containing a character and its frequency.
- Construct the Huffman tree by repeatedly merging the two nodes with the lowest frequency into a new node until there is only one node left in the queue.
- Traverse the Huffman tree to generate the Huffman codes for each character.
- Encode the input message using the Huffman codes to produce the compressed binary code.
- Decode the compressed binary code using the Huffman tree to produce the original message.
The program should be able to take an input message, compress it using the Huffman coding technique, and output the compressed binary code and the tree structure needed to decompress the message. It should also be able to take a compressed binary code and the tree structure and output the original message. The program should be able to handle any input message and produce an accurate output without losing any data. The objective of the Huffman coding problem is to develop a program that can achieve these goals efficiently and effectively.
import heapq
from collections import defaultdict
def huffman_encoding(data):
# Step 1: Count the frequency of each character in the data
frequency = defaultdict(int)
for char in data:
frequency[char] += 1
# Step 2: Create a priority queue of nodes, each containing a character and its frequency
nodes = []
for char, freq in frequency.items():
nodes.append((freq, char, None, None)) # (frequency, character, left_child, right_child)
heapq.heapify(nodes)
# Step 3: Construct the Huffman tree
while len(nodes) > 1:
left_freq, left_char, left_node, _ = heapq.heappop(nodes)
right_freq, right_char, _, right_node = heapq.heappop(nodes)
merged_node = (left_freq + right_freq, None, left_node, right_node)
heapq.heappush(nodes, merged_node)
# Step 4: Traverse the tree to generate the Huffman codes for each character
root = nodes[0]
codes = {}
def traverse(node, code):
if node[1] is not None:
codes[node[1]] = code
else:
traverse(node[2], code + '0')
traverse(node[3], code + '1')
traverse(root, '')
# Step 5: Encode the data using the Huffman codes
encoded_data = ''
for char in data:
encoded_data += codes[char]
return encoded_data, root, codes
def huffman_decoding(data, tree_root):
# Step 1: Traverse the tree to decode the Huffman-encoded data
decoded_data = ''
node = tree_root
for bit in data:
if bit == '0':
node = node[2]
else:
node = node[3]
if node[1] is not None:
decoded_data += node[1]
node = tree_root
return decoded_data
# Example usage
data = "this is an example of Huffman encoding and decoding"
encoded_data, tree_root, codes = huffman_encoding(data)
print("Encoded data:", encoded_data)
print("Huffman codes:")
for char, code in codes.items():
print(f"{char}: {code}")
decoded_data = huffman_decoding(encoded_data, tree_root)
print("Decoded data:", decoded_data)