Skip to content

P4wnda/file-compression-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman File Compression Tool

Overview

This project is a command-line tool for compressing and decompressing files using the Huffman coding algorithm. It is designed for educational purposes (e.g., the Introduction to C module at HSLU) and demonstrates key concepts such as file handling, frequency analysis, tree structures, and bit manipulation in C.

Features

  • Binary File Support: Handles any file type, not just text files
  • Progress Visualization: Real-time progress bar during compression/decompression
  • File Extension Management: Automatic .pda extension handling
  • Smart Warnings: Detects and warns about inefficient compression scenarios
  • File Handling: Reads input files and generates compressed output files
  • Frequency Analysis: Analyzes character frequencies to build optimal Huffman codes
  • Huffman Tree Construction: Uses a priority queue to assign shorter codes to frequent characters
  • Encoding & Decoding: Compresses and decompresses files using the Huffman algorithm
  • Error Handling: Detects and reports invalid input, file errors, and inefficiencies for small files
  • Command Line Interface: Simple CLI for compression and decompression

Visual Feedback

The tool now provides visual feedback during operation:

  • Progress bar showing compression/decompression status
  • Updates every 16KB of processed data
  • Compression ratio reporting after completion
  • Warning messages for inefficient compression scenarios

Compression Effectiveness

The effectiveness of compression varies significantly depending on the file type:

Effective Compression (Recommended):

  • Text files (.txt)
  • CSV files (.csv)
  • Log files (.log)
  • Source code files (.c, .h, etc.)
  • Raw/uncompressed image files (.bmp)
  • Raw data files

Poor Compression (Not Recommended):

  • Word documents (.docx, .doc)
  • PDF files (.pdf)
  • Compressed images (.jpg, .png, .gif)
  • Audio/video files (.mp3, .mp4, .avi)
  • Archive files (.zip, .rar, .7z)
  • Executables (.exe, .dll)

The tool will now warn you when attempting to compress already-compressed file types, as these typically won't benefit from additional Huffman compression.

Build Instructions

Linux / macOS

  1. Make sure gcc is installed.
  2. Build with the Makefile:
    make
  3. The program will be built as huffman.

Windows

  1. Install MinGW or TDM-GCC and ensure gcc is in your PATH.
  2. Open a Bash shell (e.g., Git Bash, MSYS2).
  3. Run the build script:
    ./build_windows.sh
  4. The program will be built as huffman.exe.

Usage

Compress:

./huffman -c <input_file> <output_file>

The compressed file will automatically get the .pda extension.

Decompress:

./huffman -d <input_file> <output_file>

The input file must have .pda extension for decompression.

Example:

# Compress (will create example.pda) ./huffman -c example.txt example # Decompress (will restore original file) ./huffman -d example.pda example.txt

Notes

  • For very small files (<100 bytes), the compressed file may be larger than the original due to Huffman tree overhead
  • The maximum supported file size is 1GB
  • The tool is cross-platform and works on Linux, macOS, and Windows (32/64 bit)
  • Compression ratio depends heavily on file content and type
  • Progress bar provides real-time feedback during operation
  • Automatic warnings for inefficient compression scenarios

Project Objectives

  1. Binary Support: Handle any file type
  2. Visual Feedback: Show progress during operation
  3. File Handling: Read and write files
  4. Frequency Analysis: Analyze byte frequencies
  5. Huffman Tree Construction: Build the Huffman tree
  6. Encoding Process: Encode and decode files
  7. Error Handling: Robust error handling
  8. CLI: Command-line interface only

Folder Structure

src/ ├── core/ # Core Huffman algorithm │ ├── huffman.c │ └── huffman.h ├── io/ # Input/Output │ ├── bit_io.c │ ├── bit_io.h │ ├── file_io.c │ └── file_io.h ├── compression/ # Compression specific │ ├── encode.c │ ├── encode.h │ ├── decode.c │ └── decode.h ├── utils/ # Utility functions │ ├── frequency.c │ ├── frequency.h │ ├── file_extension.c # Extension handling │ ├── file_extension.h │ ├── progress_bar.c # Progress visualization │ └── progress_bar.h └── main.c # Entry point 

About

Huffman Encoding-Based File Compression Tool

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published