🧠 Deepfake Detection System

A state-of-the-art deepfake detection system built with PyTorch and EfficientNet-B0, featuring a user-friendly web interface for real-time image and video analysis.

⚙️ Created By

👨‍💻 T RAHUL SINGH
🧑‍💻 Mallikarjun Macherla
🧑‍💻 Sainath

🌟 Features

Deep Learning Model: EfficientNet-B0 architecture fine-tuned for deepfake detection
Multi-format Support: Analyze both images (.jpg, .jpeg, .png) and videos (.mp4, .mov)
Web Interface: Interactive Gradio-based web application for easy testing
Real-time Analysis: Process first frame of videos for quick deepfake detection
Training Pipeline: Complete PyTorch Lightning training infrastructure
Model Export: Support for PyTorch (.pt) and ONNX format exports

🚀 Quick Start

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU (optional, but recommended for training)

Installation

Clone the repository:

git clone https://github.com/TRahulsingh/DeepfakeDetector.git cd DeepfakeDetector

Install dependencies:
```
pip install -r requirements.txt
```
Download a pre-trained model (or train your own):
- Place your model file as models/best_model-v3.pt

Usage

🖥️ Web Application

Launch the interactive web interface:

python web-app.py

The web app will open in your browser where you can:

Drag and drop images or videos
View real-time predictions with confidence scores
See preview of analyzed content

🔍 Command Line Classification

Classify individual images:

python classify.py --image path/to/your/image.jpg

🎥 Video Analysis

Process videos frame by frame:

python inference/video_inference.py --video path/to/your/video.mp4

📂 Supported Datasets

This deepfake detection system supports various popular deepfake datasets. Below are the recommended datasets for training and evaluation:

🎬 Video-based Datasets

FaceForensics++

Description: One of the most comprehensive deepfake datasets with 4 manipulation methods
Size: ~1,000 original videos, ~4,000 manipulated videos
Manipulations: Deepfakes, Face2Face, FaceSwap, NeuralTextures
Quality: Raw, c23 (light compression), c40 (heavy compression)
Download: GitHub Repository
Usage: Excellent for training robust models across different manipulation types

Celeb-DF (v2)

Description: High-quality celebrity deepfake dataset
Size: 590 real videos, 5,639 deepfake videos
Quality: High-resolution with improved visual quality
Download: Official Website
Usage: Great for testing model performance on high-quality deepfakes

DFDC (Deepfake Detection Challenge)

Description: Facebook's large-scale deepfake detection dataset
Size: ~100,000 videos (real and fake)
Diversity: Multiple actors, ethnicities, and ages
Download: Kaggle Competition
Usage: Large-scale training and benchmarking

DFD (Google's Deepfake Detection Dataset)

Description: Google/Jigsaw deepfake dataset
Size: ~3,000 deepfake videos
Quality: High-quality with various compression levels
Download: FaceForensics++ repository
Usage: Additional training data for model robustness

🖼️ Image-based Datasets

140k Real and Fake Faces

Description: Large collection of real and AI-generated face images
Size: ~140,000 images
Source: StyleGAN-generated faces vs real faces
Download: Kaggle Dataset
Usage: Perfect for image-based deepfake detection training

CelebA-HQ

Description: High-quality celebrity face dataset
Size: 30,000 high-resolution images
Quality: 1024×1024 resolution
Download: GitHub Repository
Usage: Real face examples for training

🔧 Dataset Preparation

Option 1: Download Pre-processed Datasets

Download your chosen dataset from the links above
Extract to the data/ folder
Organize as shown in the training section below

Option 2: Use Dataset Preparation Tools

Use our built-in tools to prepare datasets:

# Split video dataset into frames python tools/split_video_dataset.py --input_dir raw_videos --output_dir data # Split dataset into train/validation python tools/split_train_val.py --input_dir data --train_ratio 0.8 # General dataset splitting python tools/split_dataset.py --input_dir your_dataset --output_dir data

📋 Dataset Recommendations

For Beginners: Start with 140k Real and Fake Faces (image-based, easy to work with)
For Research: Use FaceForensics++ (comprehensive, multiple manipulation types)
For Production: Combine DFDC + Celeb-DF (large scale, diverse)
For High-Quality Testing: Use Celeb-DF v2 (challenging, high-quality deepfakes)

⚠️ Dataset Usage Notes

Ethical Use: These datasets are for research purposes only
Legal Compliance: Ensure compliance with dataset licenses and terms of use
Privacy: Respect privacy rights of individuals in the datasets
Citation: Properly cite the original dataset papers when publishing research

🏋️ Training

Dataset Structure

Organize your training data in the data folder as follows:

data/ ├── train/ │ ├── real/ │ │ ├── image1.jpg │ │ └── image2.jpg │ └── fake/ │ ├── fake1.jpg │ └── fake2.jpg └── validation/ ├── real/ └── fake/

Configuration

Update config.yaml with your dataset paths:

train_paths: - data/train val_paths: - data/validation lr: 0.0001 batch_size: 4 num_epochs: 10

Start Training

python main_trainer.py

or

python model_trainer.py

The training will:

Use PyTorch Lightning for efficient training
Save best model based on validation loss
Log metrics to TensorBoard
Apply early stopping to prevent overfitting

Monitor Training

View training progress with TensorBoard:

tensorboard --logdir lightning_logs

📁 Project Structure

├── web-app.py # Main web application ├── main_trainer.py # Primary training script ├── classify.py # Image classification utility ├── realeval.py # Real-world evaluation script ├── config.yaml # Training configuration ├── requirements.txt # Python dependencies ├── README.md # Project documentation ├── LICENSE # MIT License ├── .gitignore # Git ignore rules ├── data/ # Dataset storage (not tracked by git) │ ├── train/ # Training data │ └── validation/ # Validation data ├── datasets/ │ └── hybrid_loader.py # Custom dataset loader ├── lightning_modules/ │ └── detector.py # PyTorch Lightning module ├── models/ │ └── best_model-v3.pt # Trained model weights ├── tools/ # Dataset preparation utilities │ ├── split_dataset.py │ ├── split_train_val.py │ └── split_video_dataset.py └── inference/ ├── export_onnx.py # ONNX export └── video_inference.py # Video processing

🛠️ Model Architecture

Backbone: EfficientNet-B0 (pre-trained on ImageNet)
Classifier: Custom 2-class classifier with dropout (0.4)
Input Size: 224x224 RGB images
Output: Binary classification (Real/Fake) with confidence scores

📊 Performance

The model achieves:

High accuracy on diverse deepfake datasets
Real-time inference capabilities
Robust performance on compressed/low-quality media

🔧 Advanced Usage

Export to ONNX

Convert PyTorch model to ONNX format:

python inference/export_onnx.py

Batch Evaluation

Process multiple files programmatically:

from web-app import predict_file results = [] for file_path in image_paths: prediction, confidence, preview = predict_file(file_path) results.append({ 'file': file_path, 'prediction': prediction, 'confidence': confidence })

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

🙏 Acknowledgments

EfficientNet architecture by Google Research
PyTorch Lightning for training infrastructure
Gradio for web interface framework
The research community for deepfake detection advances

📄 License

This project is licensed under the MIT License.

⭐ Star this repository if you found it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets		datasets
inference		inference
lightning_modules		lightning_modules
models		models
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classify.py		classify.py
config.yaml		config.yaml
main_trainer.py		main_trainer.py
realeval.py		realeval.py
requirements.txt		requirements.txt
web-app.py		web-app.py

Folders and files

Latest commit

History

Repository files navigation