Skip to content

horvathcso/IS-Project

Repository files navigation

Importance Sampling for Robust Machine Learning

Project: Mathematical Foundations Individual Project
Topic: Theoretical and Empirical Analysis of Importance Sampling in Convex and Deep Learning Settings

Overview

This repository investigates the effectiveness of Importance Sampling (IS) as a variance reduction technique in stochastic optimization, with a focus on robustness to data heterogeneity and label noise. The project comprises both theoretical analysis and comprehensive empirical validation across convex optimization and deep neural network training.

Key Research Questions

  1. Does importance sampling accelerate convergence in the presence of data heterogeneity?
  2. How robust is IS-based SGD to label noise compared to uniform sampling?
  3. Can proxy models provide effective importance scores for deep learning training?

Repository Structure

MF-individual-project/ ├── README.md # This file ├── MF_project_Csongor.pdf # Full project report ├── Convex-IS-notebook.ipynb # Interactive notebook for convex experiments │ ├── Convex-Noise/ # Convex optimization experiments │ ├── README.md # Convex experiments documentation │ ├── IS-noise.py # Main experimental script │ └── Res-Convex-noisy-*/ # Results directories │ ├── scaled_{ratio}/ # Data scaling experiments │ └── flip-{type}/ # Noise injection strategies │ ├── DL-correlation/ # Deep learning correlation analysis │ ├── README.md # Correlation experiments documentation │ └── IS_01_corelations.py # Multi-model correlation study │ └── DL-noise/ # Deep learning noise robustness ├── README.md # DL experiments documentation ├── is.py # Main training pipeline ├── plot.py # Visualization utilities └── r-b128/ # Results storage └── res.txt # Experimental logs 

Experimental Setup

Convex Setting (Regularized SVM)

  • Problem: Binary classification with squared hinge loss
  • Dataset: Synthetic data with heterogeneous feature norms
  • Methods: Uniform SGD vs. IS-SGD with norm-based sampling
  • Metrics: Parameter distance, objective value, test error

Deep Learning Setting (CIFAR-10)

  • Task: Image classification with label noise
  • Architecture: VGG-19 with batch normalization
  • Proxy Models: ResNet-20, MobileNetV2, ShuffleNetV2
  • Strategies:
    • Baseline (random sampling)
    • Consensus-high (high loss across proxies)
    • Ambiguous (high variance across proxies)
  • Noise Levels: 0%, 2%, 5%, 10%, 25% label corruption

Usage

Convex Experiments

cd Convex-Noise python IS-noise.py

See Convex-Noise/README.md for configuration options.

Correlation Analysis

cd DL-correlation python IS_01_corelations.py

Analyzes cross-model score correlations on CIFAR-10/100.

Deep Learning Training

cd DL-noise python is.py python plot.py # Generate visualizations

See DL-noise/README.md for hyperparameter settings.

Dependencies

Core Libraries:

  • numpy - Numerical computing
  • matplotlib, seaborn - Visualization
  • scipy - Statistical analysis
  • pandas - Data manipulation

Deep Learning:

  • torch, torchvision - PyTorch framework
  • tqdm - Progress bars

Pre-trained Models: Models loaded via torch.hub from chenyaofo/pytorch-cifar-models

Installation

# Create virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # Install dependencies pip install numpy matplotlib seaborn scipy pandas pip install torch torchvision tqdm

Reproducibility

All experiments use fixed random seeds for reproducibility. Results are averaged over 3 independent runs in deep learning experiments.

Author

Csongor Horváth - Mathematical Foundations Individual Course Project (2025-2026)

AI Discloser

In the creation of experimental code and documentation AI assistance were used (model Gemini 3).

Citation

If using this code, please reference:

Horváth Cs. Importance Sampling for Robust Machine Learning. Mathematical Foundations Individual Project, 2026.

License

This project is for academic purposes.

About

Mathematical Foundations: Individual Specialized Research on Importance Sampling for SGD

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors