This project is a command-line tool for compressing and decompressing files using the Huffman coding algorithm. It is designed for educational purposes (e.g., the Introduction to C module at HSLU) and demonstrates key concepts such as file handling, frequency analysis, tree structures, and bit manipulation in C.
- Binary File Support: Handles any file type, not just text files
- Progress Visualization: Real-time progress bar during compression/decompression
- File Extension Management: Automatic .pda extension handling
- Smart Warnings: Detects and warns about inefficient compression scenarios
- File Handling: Reads input files and generates compressed output files
- Frequency Analysis: Analyzes character frequencies to build optimal Huffman codes
- Huffman Tree Construction: Uses a priority queue to assign shorter codes to frequent characters
- Encoding & Decoding: Compresses and decompresses files using the Huffman algorithm
- Error Handling: Detects and reports invalid input, file errors, and inefficiencies for small files
- Command Line Interface: Simple CLI for compression and decompression
The tool now provides visual feedback during operation:
- Progress bar showing compression/decompression status
- Updates every 16KB of processed data
- Compression ratio reporting after completion
- Warning messages for inefficient compression scenarios
The effectiveness of compression varies significantly depending on the file type:
- Text files (.txt)
- CSV files (.csv)
- Log files (.log)
- Source code files (.c, .h, etc.)
- Raw/uncompressed image files (.bmp)
- Raw data files
- Word documents (.docx, .doc)
- PDF files (.pdf)
- Compressed images (.jpg, .png, .gif)
- Audio/video files (.mp3, .mp4, .avi)
- Archive files (.zip, .rar, .7z)
- Executables (.exe, .dll)
The tool will now warn you when attempting to compress already-compressed file types, as these typically won't benefit from additional Huffman compression.
- Make sure
gccis installed. - Build with the Makefile:
make
- The program will be built as
huffman.
- Install MinGW or TDM-GCC and ensure
gccis in your PATH. - Open a Bash shell (e.g., Git Bash, MSYS2).
- Run the build script:
./build_windows.sh
- The program will be built as
huffman.exe.
Compress:
./huffman -c <input_file> <output_file>The compressed file will automatically get the .pda extension.
Decompress:
./huffman -d <input_file> <output_file>The input file must have .pda extension for decompression.
Example:
# Compress (will create example.pda) ./huffman -c example.txt example # Decompress (will restore original file) ./huffman -d example.pda example.txt- For very small files (<100 bytes), the compressed file may be larger than the original due to Huffman tree overhead
- The maximum supported file size is 1GB
- The tool is cross-platform and works on Linux, macOS, and Windows (32/64 bit)
- Compression ratio depends heavily on file content and type
- Progress bar provides real-time feedback during operation
- Automatic warnings for inefficient compression scenarios
- Binary Support: Handle any file type
- Visual Feedback: Show progress during operation
- File Handling: Read and write files
- Frequency Analysis: Analyze byte frequencies
- Huffman Tree Construction: Build the Huffman tree
- Encoding Process: Encode and decode files
- Error Handling: Robust error handling
- CLI: Command-line interface only
src/ ├── core/ # Core Huffman algorithm │ ├── huffman.c │ └── huffman.h ├── io/ # Input/Output │ ├── bit_io.c │ ├── bit_io.h │ ├── file_io.c │ └── file_io.h ├── compression/ # Compression specific │ ├── encode.c │ ├── encode.h │ ├── decode.c │ └── decode.h ├── utils/ # Utility functions │ ├── frequency.c │ ├── frequency.h │ ├── file_extension.c # Extension handling │ ├── file_extension.h │ ├── progress_bar.c # Progress visualization │ └── progress_bar.h └── main.c # Entry point