Name	Name	Last commit message	Last commit date
parent directory ..
framework	framework
.gitignore	.gitignore
README.md	README.md
requirements.txt	requirements.txt
run_benchmarks.py	run_benchmarks.py

Benchmarking v2

A comprehensive benchmarking framework for transformer models that supports multiple execution modes (eager, compiled, kernelized), detailed performance metrics collection, and structured output format.

Quick Start

Running All Benchmarks

# Run all benchmarks with default settings python run_benchmarks.py # Specify output directory python run_benchmarks.py --output-dir my_results # Run with custom parameters python run_benchmarks.py \ --warmup-iterations 5 \ --measurement-iterations 10 \ --num-tokens-to-generate 200

Uploading Results to HuggingFace Dataset

You can automatically upload benchmark results to a HuggingFace Dataset for tracking and analysis:

# Upload to a public dataset with auto-generated run ID python run_benchmarks.py --upload-to-hub username/benchmark-results # Upload with a custom run ID for easy identification python run_benchmarks.py --upload-to-hub username/benchmark-results --run-id experiment_v1 # Upload with custom HuggingFace token (if not set in environment) python run_benchmarks.py --upload-to-hub username/benchmark-results --token hf_your_token_here

Dataset Directory Structure:

dataset_name/ ├── 2025-01-15/ │ ├── runs/ # Non-scheduled runs (manual, PR, etc.) │ │ └── 123-1245151651/ # GitHub run number and ID │ │ └── benchmark_results/ │ │ ├── benchmark_summary_20250115_143022.json │ │ └── model-name/ │ │ └── model-name_benchmark_20250115_143022.json │ └── benchmark_results_abc123de/ # Scheduled runs (daily CI) │ ├── benchmark_summary_20250115_143022.json │ └── model-name/ │ └── model-name_benchmark_20250115_143022.json └── 2025-01-16/ └── ...

Authentication for Uploads:

For uploading results, you need a HuggingFace token with write permissions to the target dataset. You can provide the token in several ways (in order of precedence):

Command line: --token hf_your_token_here
Environment variable: HF_TOKEN

Running Specific Benchmarks

# Include only specific benchmarks python run_benchmarks.py --include llama # Exclude specific benchmarks python run_benchmarks.py --exclude old_benchmark ## Output Format Results are saved as JSON files with the following structure: ```json {  "model_name": "llama_2_7b",  "benchmark_scenarios": [  {  "scenario_name": "eager_variant",  "metadata": {  "timestamp": "2025-01-XX...",  "commit_id": "abc123...",  "hardware_info": {  "gpu_name": "NVIDIA A100",  "gpu_memory_total": 40960,  "cpu_count": 64  },  "config": {  "variant": "eager",  "warmup_iterations": 3,  "measurement_iterations": 5  }  },  "measurements": {  "latency": {  "mean": 2.45,  "median": 2.43,  "std": 0.12,  "min": 2.31,  "max": 2.67,  "p95": 2.61,  "p99": 2.65  },  "time_to_first_token": {  "mean": 0.15,  "std": 0.02  },  "tokens_per_second": {  "mean": 87.3,  "unit": "tokens/sec"  }  },  "gpu_metrics": {  "gpu_utilization_mean": 85.2,  "gpu_memory_used_mean": 12450  }  }  ] }

Debug Mode

python run_benchmarks.py --log-level DEBUG

Contributing

To add new benchmarks:

Create a new file in benches/
Implement the ModelBenchmark interface
Add a runner function (run_<benchmark_name> or run_benchmark)
run_benchmarks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Benchmarking v2

Quick Start

Running All Benchmarks

Uploading Results to HuggingFace Dataset

Running Specific Benchmarks

Debug Mode

Contributing

FilesExpand file tree

benchmark_v2

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark_v2

Folders and files

parent directory

README.md

Benchmarking v2

Quick Start

Running All Benchmarks

Uploading Results to HuggingFace Dataset

Running Specific Benchmarks

Debug Mode

Contributing