Importance Sampling for Robust Machine Learning

Project: Mathematical Foundations Individual Project
Topic: Theoretical and Empirical Analysis of Importance Sampling in Convex and Deep Learning Settings

Overview

This repository investigates the effectiveness of Importance Sampling (IS) as a variance reduction technique in stochastic optimization, with a focus on robustness to data heterogeneity and label noise. The project comprises both theoretical analysis and comprehensive empirical validation across convex optimization and deep neural network training.

Key Research Questions

Does importance sampling accelerate convergence in the presence of data heterogeneity?
How robust is IS-based SGD to label noise compared to uniform sampling?
Can proxy models provide effective importance scores for deep learning training?

Repository Structure

MF-individual-project/ ├── README.md # This file ├── MF_project_Csongor.pdf # Full project report ├── Convex-IS-notebook.ipynb # Interactive notebook for convex experiments │ ├── Convex-Noise/ # Convex optimization experiments │ ├── README.md # Convex experiments documentation │ ├── IS-noise.py # Main experimental script │ └── Res-Convex-noisy-*/ # Results directories │ ├── scaled_{ratio}/ # Data scaling experiments │ └── flip-{type}/ # Noise injection strategies │ ├── DL-correlation/ # Deep learning correlation analysis │ ├── README.md # Correlation experiments documentation │ └── IS_01_corelations.py # Multi-model correlation study │ └── DL-noise/ # Deep learning noise robustness ├── README.md # DL experiments documentation ├── is.py # Main training pipeline ├── plot.py # Visualization utilities └── r-b128/ # Results storage └── res.txt # Experimental logs

Experimental Setup

Convex Setting (Regularized SVM)

Problem: Binary classification with squared hinge loss
Dataset: Synthetic data with heterogeneous feature norms
Methods: Uniform SGD vs. IS-SGD with norm-based sampling
Metrics: Parameter distance, objective value, test error

Deep Learning Setting (CIFAR-10)

Task: Image classification with label noise
Architecture: VGG-19 with batch normalization
Proxy Models: ResNet-20, MobileNetV2, ShuffleNetV2
Strategies:
- Baseline (random sampling)
- Consensus-high (high loss across proxies)
- Ambiguous (high variance across proxies)
Noise Levels: 0%, 2%, 5%, 10%, 25% label corruption

Usage

Convex Experiments

cd Convex-Noise python IS-noise.py

See Convex-Noise/README.md for configuration options.

Correlation Analysis

cd DL-correlation python IS_01_corelations.py

Analyzes cross-model score correlations on CIFAR-10/100.

Deep Learning Training

cd DL-noise python is.py python plot.py # Generate visualizations

See DL-noise/README.md for hyperparameter settings.

Dependencies

Core Libraries:

numpy - Numerical computing
matplotlib, seaborn - Visualization
scipy - Statistical analysis
pandas - Data manipulation

Deep Learning:

torch, torchvision - PyTorch framework
tqdm - Progress bars

Pre-trained Models: Models loaded via torch.hub from chenyaofo/pytorch-cifar-models

Installation

# Create virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # Install dependencies pip install numpy matplotlib seaborn scipy pandas pip install torch torchvision tqdm

Reproducibility

All experiments use fixed random seeds for reproducibility. Results are averaged over 3 independent runs in deep learning experiments.

Author

Csongor Horváth - Mathematical Foundations Individual Course Project (2025-2026)

AI Discloser

In the creation of experimental code and documentation AI assistance were used (model Gemini 3).

Citation

If using this code, please reference:

Horváth Cs. Importance Sampling for Robust Machine Learning. Mathematical Foundations Individual Project, 2026.

License

This project is for academic purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Importance Sampling for Robust Machine Learning

Overview

Key Research Questions

Repository Structure

Experimental Setup

Convex Setting (Regularized SVM)

Deep Learning Setting (CIFAR-10)

Usage

Convex Experiments

Correlation Analysis

Deep Learning Training

Dependencies

Installation

Reproducibility

Author

AI Discloser

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Convex-Noise		Convex-Noise
DL-correlation		DL-correlation
DL-noise		DL-noise
.gitignore		.gitignore
Convex-IS-notebook.ipynb		Convex-IS-notebook.ipynb
MF_project_Csongor.pdf		MF_project_Csongor.pdf
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Importance Sampling for Robust Machine Learning

Overview

Key Research Questions

Repository Structure

Experimental Setup

Convex Setting (Regularized SVM)

Deep Learning Setting (CIFAR-10)

Usage

Convex Experiments

Correlation Analysis

Deep Learning Training

Dependencies

Installation

Reproducibility

Author

AI Discloser

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages