Skip to content
Navigation Menu
Toggle navigation
Sign in
Appearance settings
Platform
AI CODE CREATION
GitHub Copilot
Write better code with AI
GitHub Spark
Build and deploy intelligent apps
GitHub Models
Manage and compare prompts
MCP Registry
New
Integrate external tools
DEVELOPER WORKFLOWS
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
APPLICATION SECURITY
GitHub Advanced Security
Find and fix vulnerabilities
Code security
Secure your code as you build
Secret protection
Stop leaks before they start
EXPLORE
Why GitHub
Documentation
Blog
Changelog
Marketplace
View all features
Solutions
BY COMPANY SIZE
Enterprises
Small and medium teams
Startups
Nonprofits
BY USE CASE
App Modernization
DevSecOps
DevOps
CI/CD
View all use cases
BY INDUSTRY
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
EXPLORE BY TOPIC
AI
Software Development
DevOps
Security
View all topics
EXPLORE BY TYPE
Customer stories
Events & webinars
Ebooks & reports
Business insights
GitHub Skills
SUPPORT & SERVICES
Documentation
Customer support
Community forum
Trust center
Partners
View all resources
Open Source
COMMUNITY
GitHub Sponsors
Fund open source developers
PROGRAMS
Security Lab
Maintainer Community
Accelerator
GitHub Stars
Archive Program
REPOSITORIES
Topics
Trending
Collections
Enterprise
ENTERPRISE SOLUTIONS
Enterprise platform
AI-powered developer platform
AVAILABLE ADD-ONS
GitHub Advanced Security
Enterprise-grade security features
Copilot for Business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Appearance settings
Resetting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
minpeter
/
openbench
Public
forked from
groq/openbench
Notifications
You must be signed in to change notification settings
Fork
0
Star
0
Code
Pull requests
0
Actions
Projects
Security
0
Insights
Additional navigation options
Code
Pull requests
Actions
Projects
Security
Insights
Commits
Branch selector
main
User selector
All users
All time
Commit History
Commits on Nov 5, 2025
fix: rename gpt_oss_aime to gpt_oss_aime25
AarushSah
committed
b378715
Copy full SHA for b378715
feat(gpt_oss): add GPT-OSS AIME benchmark, make --epochs optional and stop default 1 from being forced down (#284)
Show description for 815f51b
AarushSah
and
github-actions[bot]
authored
815f51b
Copy full SHA for 815f51b
feat: add sealqa benchmark (#283)
Show description for 06b39e4
bklieger-groq
and
github-actions[bot]
authored
06b39e4
Copy full SHA for 06b39e4
chore: upgrade numpy version and update uv lock (#281)
bklieger-groq
authored
15f2dbf
Copy full SHA for 15f2dbf
feat: add --max-tasks option for concurrent task execution in eval command (#279)
AarushSah
authored
241e653
Copy full SHA for 241e653
fix(docs): reasoning-effort docs clarity (#278)
AarushSah
authored
2644619
Copy full SHA for 2644619
fix: add args to eval command (#276)
xeophon
authored
0e06988
Copy full SHA for 0e06988
feat(rocketscience): add rocketscience benchmark support (#277)
nilshoehing
authored
73bcfc2
Copy full SHA for 73bcfc2
feat(provider): add helicone support (#275)
juliettech13
authored
de6ab04
Copy full SHA for de6ab04
Commits on Oct 30, 2025
fix: using huggingface instead of kagglehub for simpleqa_verified benchmark (#270)
xinlei55555
authored
8ee1efa
Copy full SHA for 8ee1efa
feat(provider): add SiliconFlow provider support (#269)
qychen2001
authored
ce14070
Copy full SHA for ce14070
Commits on Oct 29, 2025
fix(factscore): fix module level import error for optional dep (#274)
lee-groq
authored
99594ff
Copy full SHA for 99594ff
Commits on Oct 27, 2025
chore: reduce math EvalGroup to most recent tasks only
nmayorga7
committed
420dcb9
Copy full SHA for 420dcb9
fix: factscore import issues, vLLM timeout bug (#273)
AarushSah
authored
1674528
Copy full SHA for 1674528
feat(vllm): add openbench override for Inspect AI's built-in vllm provider that doesn't start a server (#272)
AarushSah
authored
d0eff6f
Copy full SHA for d0eff6f
Commits on Oct 25, 2025
feat(groq): implement configurable timeout for GroqAPI client (#271)
AarushSah
authored
be492b6
Copy full SHA for be492b6
Commits on Oct 24, 2025
fix(math): shorten math group (#268)
Show description for 19cc66b
lvogel04
and
github-actions[bot]
authored
19cc66b
Copy full SHA for 19cc66b
Commits on Oct 23, 2025
chore: fix deprecated methods for dataset loading with scripts (#267)
nmayorga7
authored
4c503f6
Copy full SHA for 4c503f6
feat: add optional extras for simpleqa and toxicity (#266)
AarushSah
authored
2450ddf
Copy full SHA for 2450ddf
Commits on Oct 22, 2025
feat: add math EvalGroup (#263)
Show description for e0f4a9b
nmayorga7
and
github-actions[bot]
authored
e0f4a9b
Copy full SHA for e0f4a9b
feat(cli): added export command to exposrt specific logs to hf (#265)
lvogel04
authored
62e8d8c
Copy full SHA for 62e8d8c
feat(providers): add W&B Inference model provider (#264)
Show description for a02c34f
AarushSah
and
github-actions[bot]
authored
a02c34f
Copy full SHA for a02c34f
docs: add missing docstrings and type hints for code clarity (#221)
Show description for 38d34a0
harjothkhara
and
AarushSah
authored
38d34a0
Copy full SHA for 38d34a0
feat(agentdojo): port agentdojo benchmark (#223)
Show description for 1cf174c
lee-groq
and
nmayorga7
authored
1cf174c
Copy full SHA for 1cf174c
feat(PolygloToxicityPrompts): add multilingual toxicity evaluation (#262)
Show description for 46de7ee
3 people
authored
46de7ee
Copy full SHA for 46de7ee
Commits on Oct 21, 2025
fix(livemcpbench): catch errors on call_tool and route (#260)
lvogel04
authored
0ab746d
Copy full SHA for 0ab746d
feat(m2s): added support for single turn conversion of 3 multi turn jailbreak datasets (mhj, safeMT, cosafe) (#222)
Show description for 6b8f2b1
lvogel04
and
github-actions[bot]
authored
6b8f2b1
Copy full SHA for 6b8f2b1
feat: add configurable HuggingFace Hub config naming (#261)
Show description for 8abe2ae
nmayorga7
and
claude
authored
8abe2ae
Copy full SHA for 8abe2ae
Commits on Oct 20, 2025
feat(cvebench): added auto prepare env set up for cvebench (#259)
lvogel04
authored
db238a3
Copy full SHA for db238a3
Commits on Oct 19, 2025
fix(simpleqa_verified): silence mypy for optional kagglehub import (#257)
xinlei55555
authored
32a1ff4
Copy full SHA for 32a1ff4
Commits on Oct 18, 2025
feat(factscore): added support for factscore (#258)
Show description for 13aafd7
lvogel04
and
github-actions[bot]
authored
13aafd7
Copy full SHA for 13aafd7
Commits on Oct 17, 2025
feat(simpleqa_verified): add SimpleQA Verified benchmark (#249)
Show description for 8a512c4
xinlei55555
and
AarushSah
authored
8a512c4
Copy full SHA for 8a512c4
feat: add SMT 2024 benchmarks (#239)
Show description for 5d9b475
4 people
authored
5d9b475
Copy full SHA for 5d9b475
Commits on Oct 16, 2025
feat: add bbq benchmark (#255)
Show description for 46f4744
nmayorga7
and
github-actions[bot]
authored
46f4744
Copy full SHA for 46f4744
chore: release 0.5.2 (#253)
Show description for f34ba88
github-actions[bot]
authored
f34ba88
Copy full SHA for f34ba88
Pagination
Previous
Next
You can’t perform that action at this time.