Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Cold Start Benchmarking

Performance benchmarking tools for measuring and comparing cold start times across different code changes.

Quick Start

# Run benchmark on current branch uv run pytest tests/test_performance/test_cold_start.py # Compare two branches ./scripts/benchmark_cold_start.sh main my-feature-branch # Compare two existing result files uv run python scripts/compare_benchmarks.py benchmark_results/cold_start_baseline.json benchmark_results/cold_start_latest.json

What Gets Measured

  • Import times: import runpod, import runpod.serverless, import runpod.endpoint
  • Module counts: Total modules loaded and runpod-specific modules
  • Lazy loading status: Whether paramiko and SSH CLI are eagerly or lazy-loaded
  • Statistics: Min, max, mean, median across 10 iterations per measurement

Tools

1. test_cold_start.py

Core benchmark test that measures import performance in fresh Python subprocesses.

# Run as pytest test uv run pytest tests/test_performance/test_cold_start.py -v # Run as standalone script uv run python tests/test_performance/test_cold_start.py # Results saved to: # - benchmark_results/cold_start_<timestamp>.json # - benchmark_results/cold_start_latest.json (always latest)

Output Example:

Running cold start benchmarks... ------------------------------------------------------------ Measuring 'import runpod'... Mean: 273.29ms Measuring 'import runpod.serverless'... Mean: 332.18ms Counting loaded modules... Total modules: 582 Runpod modules: 46 Checking if paramiko is eagerly loaded... Paramiko loaded: False 

2. benchmark_cold_start.sh

Automated benchmark runner that handles git branch switching, dependency installation, and result collection.

# Run on current branch (no git operations) ./scripts/benchmark_cold_start.sh # Run on specific branch ./scripts/benchmark_cold_start.sh main # Compare two branches (runs both, then compares) ./scripts/benchmark_cold_start.sh main feature/lazy-loading

Features:

  • Automatic stash/unstash of uncommitted changes
  • Dependency installation per branch
  • Safe branch switching with restoration
  • Timestamped result files
  • Automatic comparison when comparing branches

Safety:

  • Stashes uncommitted changes before switching branches
  • Restores original branch after completion
  • Handles errors gracefully

3. compare_benchmarks.py

Analyzes and visualizes differences between two benchmark runs with colored terminal output.

uv run python scripts/compare_benchmarks.py <baseline.json> <optimized.json>

Output Example:

====================================================================== COLD START BENCHMARK COMPARISON ====================================================================== IMPORT TIME COMPARISON ---------------------------------------------------------------------- Metric Baseline Optimized Δ ms Δ % ---------------------------------------------------------------------- runpod_total 285.64ms 273.29ms ↓ 12.35ms 4.32% runpod_serverless 376.33ms 395.14ms ↑ -18.81ms -5.00% runpod_endpoint 378.61ms 399.36ms ↑ -20.75ms -5.48% MODULE LOAD COMPARISON ---------------------------------------------------------------------- Total modules loaded: Baseline: 698 Optimized: 582 Δ: 116 Runpod modules loaded: Baseline: 48 Optimized: 46 Δ: 2 LAZY LOADING STATUS ---------------------------------------------------------------------- Paramiko Baseline: LOADED Optimized: NOT LOADED ✓ NOW LAZY SSH CLI Baseline: LOADED Optimized: NOT LOADED ✓ NOW LAZY ====================================================================== SUMMARY ====================================================================== ✓ Cold start improved by 12.35ms ✓ That's a 4.3% improvement over baseline ✓ Baseline: 285.64ms → Optimized: 273.29ms ====================================================================== 

Color coding:

  • Green: Improvements (faster times, lazy loading achieved)
  • Red: Regressions (slower times, eager loading introduced)
  • Yellow: No change

Result Files

All benchmark results are saved to benchmark_results/ (gitignored).

File naming:

  • cold_start_<timestamp>.json - Timestamped result
  • cold_start_latest.json - Always contains most recent result
  • cold_start_baseline.json - Manually saved baseline for comparison

JSON structure:

{ "timestamp": 1763179522.0437188, "python_version": "3.8.20 (default, Oct 2 2024, 16:12:59) [Clang 18.1.8 ]", "measurements": { "runpod_total": { "min": 375.97, "max": 527.9, "mean": 393.91, "median": 380.4, "iterations": 10 } }, "module_counts": { "total": 698, "filtered": 48 }, "paramiko_eagerly_loaded": true, "ssh_cli_loaded": true }

Common Workflows

Testing a Performance Optimization

# 1. Save baseline on main branch git checkout main ./scripts/benchmark_cold_start.sh cp benchmark_results/cold_start_latest.json benchmark_results/cold_start_baseline.json # 2. Switch to feature branch git checkout feature/my-optimization # 3. Run benchmark and compare ./scripts/benchmark_cold_start.sh uv run python scripts/compare_benchmarks.py \ benchmark_results/cold_start_baseline.json \ benchmark_results/cold_start_latest.json

Comparing Multiple Approaches

# Compare three different optimization branches ./scripts/benchmark_cold_start.sh main > results_main.txt ./scripts/benchmark_cold_start.sh feature/approach-1 > results_1.txt ./scripts/benchmark_cold_start.sh feature/approach-2 > results_2.txt # Then compare each against baseline uv run python scripts/compare_benchmarks.py \ benchmark_results/cold_start_main_*.json \ benchmark_results/cold_start_approach-1_*.json

CI/CD Integration

Add to your GitHub Actions workflow:

- name: Run cold start benchmark run: |  uv run pytest tests/test_performance/test_cold_start.py --timeout=120  - name: Upload benchmark results uses: actions/upload-artifact@v3 with: name: benchmark-results path: benchmark_results/cold_start_latest.json

Performance Targets

Based on testing with Python 3.8:

  • Cold start (import runpod): < 300ms (mean)
  • Serverless import: < 400ms (mean)
  • Module count: < 600 total modules
  • Test assertion: Fails if import > 1000ms

Interpreting Results

Import Time Variance

Subprocess-based measurements have inherent variance:

  • First run in sequence: Often 20-50ms slower (Python startup overhead)
  • Subsequent runs: More stable
  • Use median or mean for comparison, not single runs

Module Count

  • Fewer modules = faster cold start: Each module has import overhead
  • Runpod-specific modules: Should be minimal (40-50)
  • Total modules: Includes stdlib and dependencies
  • Target reduction: Removing 100+ modules typically saves 10-30ms

Lazy Loading Validation

  • paramiko_eagerly_loaded: false - Good for serverless workers
  • ssh_cli_loaded: false - Good for SDK users
  • These should only be true when CLI commands are invoked

Troubleshooting

High Variance in Results

If you see >100ms variance between runs:

  • System is under load
  • Disk I/O contention
  • Python bytecode cache issues

Solution: Run multiple times and use median values.

benchmark_cold_start.sh Fails

# Check git status git status # Manually restore if script failed mid-execution git checkout <original-branch> git stash pop

Import Errors During Benchmark

Ensure dependencies are installed:

uv sync --group test

Benchmark Accuracy

  • Iterations: 10 per measurement (configurable in test)
  • Process isolation: Each measurement uses fresh subprocess
  • Python cache: Cleared by subprocess creation
  • System state: Cannot control OS-level caching

For production performance testing, consider:

  • Running on CI with consistent environment
  • Multiple runs at different times
  • Comparing trends over multiple commits