Skip to content
Navigation Menu
Toggle navigation
Sign in
Appearance settings
Platform
AI CODE CREATION
GitHub Copilot
Write better code with AI
GitHub Spark
Build and deploy intelligent apps
GitHub Models
Manage and compare prompts
MCP Registry
New
Integrate external tools
DEVELOPER WORKFLOWS
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
APPLICATION SECURITY
GitHub Advanced Security
Find and fix vulnerabilities
Code security
Secure your code as you build
Secret protection
Stop leaks before they start
EXPLORE
Why GitHub
Documentation
Blog
Changelog
Marketplace
View all features
Solutions
BY COMPANY SIZE
Enterprises
Small and medium teams
Startups
Nonprofits
BY USE CASE
App Modernization
DevSecOps
DevOps
CI/CD
View all use cases
BY INDUSTRY
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
EXPLORE BY TOPIC
AI
Software Development
DevOps
Security
View all topics
EXPLORE BY TYPE
Customer stories
Events & webinars
Ebooks & reports
Business insights
GitHub Skills
SUPPORT & SERVICES
Documentation
Customer support
Community forum
Trust center
Partners
View all resources
Open Source
COMMUNITY
GitHub Sponsors
Fund open source developers
PROGRAMS
Security Lab
Maintainer Community
Accelerator
GitHub Stars
Archive Program
REPOSITORIES
Topics
Trending
Collections
Enterprise
ENTERPRISE SOLUTIONS
Enterprise platform
AI-powered developer platform
AVAILABLE ADD-ONS
GitHub Advanced Security
Enterprise-grade security features
Copilot for Business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Appearance settings
Resetting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
xflykkk
/
OpenHands
Public
forked from
OpenHands/OpenHands
Notifications
You must be signed in to change notification settings
Fork
0
Star
0
Code
Pull requests
0
Actions
Projects
Security
0
Insights
Additional navigation options
Code
Pull requests
Actions
Projects
Security
Insights
Commits
Breadcrumbs
History for
OpenHands
evaluation
on
main
User selector
All users
All time
Commit History
Commits on Oct 16, 2025
feat(evaluation): Add multi-swe-bench dependency and fix rollout script (#11326)
Show description for a237b57
KevinMusgrave
and
neubig
authored
a237b57
Copy full SHA for a237b57
fix(integration-tests): accept --eval-num-workers and --eval-note in integration test runner (#11387)
enyst
authored
3e645f8
Copy full SHA for 3e645f8
Mint security eval fix (#11273)
Show description for 471d272
juanmichelini
and
xingyaoww
authored
471d272
Copy full SHA for 471d272
Commits on Oct 13, 2025
feat(evaluation): Add placeholders to `swe_gpt4.j2` (#11228)
Show description for 19bae5a
KevinMusgrave
and
neubig
authored
19bae5a
Copy full SHA for 19bae5a
Commits on Sep 22, 2025
Add inference generation of SWE-Perf Benchmark (#10246)
Show description for 7906eab
4 people
authored
7906eab
Copy full SHA for 7906eab
Multi swe gym (#10605)
Show description for 547e104
juanmichelini
and
openhands-agent
authored
547e104
Copy full SHA for 547e104
Commits on Sep 8, 2025
Implement model routing support (#9738)
Show description for df9320f
ryanhoangt
and
openhands-agent
authored
df9320f
Copy full SHA for df9320f
Commits on Sep 4, 2025
Add a new benchmark: AlgoTune (#10724)
Show description for bd8b1bf
3 people
authored
bd8b1bf
Copy full SHA for bd8b1bf
Fix swe-bench `run_infer.py` config parsing from config.toml (#10792)
Zacharias030
authored
20e5c40
Copy full SHA for 20e5c40
Commits on Aug 27, 2025
feat(llm): add support for deepseek and gpt-5-mini, util for token count (#10626)
Show description for b082ccc
xingyaoww
and
openhands-agent
authored
b082ccc
Copy full SHA for b082ccc
Commits on Aug 22, 2025
Evaluation: redirect sessions to repo-local .eval_sessions via helper; apply across entrypoints; add tests (#10540)
Show description for 4507a25
xingyaoww
and
openhands-agent
authored
4507a25
Copy full SHA for 4507a25
Commits on Aug 21, 2025
Fix: expose aggregated LLM metrics in State for evaluation scripts (#10537)
Show description for 91d3d1d
enyst
and
openhands-agent
authored
91d3d1d
Copy full SHA for 91d3d1d
Commits on Aug 18, 2025
feat(evaluation): Added INSTRUCTION_TEMPLATE_NAME to run_infer.py in swe_bench (#10270)
Show description for 74ba21b
3 people
authored
74ba21b
Copy full SHA for 74ba21b
Commits on Aug 16, 2025
feat(evaluation): Add NoCode-bench evaluation script (#10229)
ZhonghaoJiang
authored
7229a16
Copy full SHA for 7229a16
Commits on Aug 15, 2025
chore(eval): remove old, unused regression test framework under evaluation/regression (#10419)
enyst
authored
f7f4fcf
Copy full SHA for f7f4fcf
Commits on Aug 13, 2025
chore(lint): Apply comprehensive linting and formatting fixes (#10287)
Show description for c2f4620
xingyaoww
and
openhands-agent
authored
c2f4620
Copy full SHA for c2f4620
Commits on Aug 12, 2025
feat(eval): Support evaluation on SWE-rebench (#10251)
ibragim-bad
authored
19a6b6b
Copy full SHA for 19a6b6b
Readability improvement & remove duplicated and unused prompts (#10241)
insop
authored
1d0d88d
Copy full SHA for 1d0d88d
Commits on Aug 11, 2025
Remove SecretStr conversion in GAIA eval (#10204)
ryanhoangt
authored
758e30c
Copy full SHA for 758e30c
Commits on Aug 8, 2025
feat(cli): Use CLI to launch OpenHands UI server via Docker (#9783)
Show description for 04ff4a0
xingyaoww
and
openhands-agent
authored
04ff4a0
Copy full SHA for 04ff4a0
Commits on Aug 7, 2025
chore(eval): Remove eval_infer_remote.sh script and related references (#10157)
Show description for c4f303a
xingyaoww
and
openhands-agent
authored
c4f303a
Copy full SHA for c4f303a
Commits on Jul 22, 2025
Evaluation: disable browser when NOT run_with_browsing (#9837)
li-boxuan
authored
7af35ab
Copy full SHA for 7af35ab
Commits on Jul 16, 2025
Fix: Continue evaluation when an instance fails after max retries (#8868)
Show description for ea50fe4
3 people
authored
ea50fe4
Copy full SHA for ea50fe4
Fix integration tests (#9746)
enyst
authored
fba2218
Copy full SHA for fba2218
Commits on Jul 15, 2025
Add README for terminal_bench evaluation harness (#9700)
li-boxuan
authored
5c3619b
Copy full SHA for 5c3619b
Commits on Jul 10, 2025
feat(eval): loc acc evaluation (#8515)
Show description for 9388fef
3 people
authored
9388fef
Copy full SHA for 9388fef
Commits on Jul 8, 2025
eval: remove gemini-specific swebench template (#9623)
xingyaoww
authored
cff5697
Copy full SHA for cff5697
Commits on Jun 25, 2025
[OH-Versa] Add remaining browsing & GAIA eval improvement (#9015)
Show description for dfa5467
3 people
authored
dfa5467
Copy full SHA for dfa5467
Commits on Jun 17, 2025
Refactor: Improve Consistency in Function Signatures and Regex Usage in compute_ism_pm_score.py (#9145)
maximevtush
authored
653a8a7
Copy full SHA for 653a8a7
[GAIA] Add prompt improvement to alleviate solution parsing issue & support Tavily search tools (#9057)
ryanhoangt
authored
ddaa186
Copy full SHA for ddaa186
Commits on Jun 16, 2025
disable mcp in run_localize and install oh-aci[llama] for issue 9150 (#9151)
better629
authored
432d882
Copy full SHA for 432d882
Commits on Jun 15, 2025
Fix Typo: Change "accurancy" to "accuracy" in Evaluation Benchmark Comments (#9139)
zeevick10
authored
e5bff91
Copy full SHA for e5bff91
feat(eval): Support evaluation on SWE-bench-Live (#9137)
yetlinghao
authored
a93b045
Copy full SHA for a93b045
Commits on Jun 14, 2025
Minor Code Comment Corrections and Clarifications (#9129)
kilavvy
authored
4e99aab
Copy full SHA for 4e99aab
Lint all files in the repo (#9131)
Show description for 0c307ea
3 people
authored
0c307ea
Copy full SHA for 0c307ea
Pagination
Previous
Next
You canβt perform that action at this time.