Commits

History for

Commits on Oct 16, 2025

feat(evaluation): Add multi-swe-bench dependency and fix rollout script (#11326)

KevinMusgrave
and
neubig
authored
fix(integration-tests): accept --eval-num-workers and --eval-note in integration test runner (#11387)
enyst
authored
Mint security eval fix (#11273)

juanmichelini
and
xingyaoww
authored

Commits on Oct 13, 2025

feat(evaluation): Add placeholders to `swe_gpt4.j2` (#11228)

KevinMusgrave
and
neubig
authored

Commits on Sep 22, 2025

Commits on Sep 8, 2025

Implement model routing support (#9738)

ryanhoangt
and
openhands-agent
authored

Commits on Sep 4, 2025

Commits on Aug 27, 2025

feat(llm): add support for deepseek and gpt-5-mini, util for token count (#10626)

xingyaoww
and
openhands-agent
authored

Commits on Aug 22, 2025

Evaluation: redirect sessions to repo-local .eval_sessions via helper; apply across entrypoints; add tests (#10540)

xingyaoww
and
openhands-agent
authored

Commits on Aug 21, 2025

Fix: expose aggregated LLM metrics in State for evaluation scripts (#10537)

enyst
and
openhands-agent
authored

Commits on Aug 18, 2025

feat(evaluation): Added INSTRUCTION_TEMPLATE_NAME to run_infer.py in swe_bench (#10270)

authored

Commits on Aug 16, 2025

feat(evaluation): Add NoCode-bench evaluation script (#10229)
ZhonghaoJiang
authored

Commits on Aug 15, 2025

chore(eval): remove old, unused regression test framework under evaluation/regression (#10419)
enyst
authored

Commits on Aug 13, 2025

chore(lint): Apply comprehensive linting and formatting fixes (#10287)

xingyaoww
and
openhands-agent
authored

Commits on Aug 12, 2025

Commits on Aug 11, 2025

Remove SecretStr conversion in GAIA eval (#10204)
ryanhoangt
authored

Commits on Aug 8, 2025

feat(cli): Use CLI to launch OpenHands UI server via Docker (#9783)

xingyaoww
and
openhands-agent
authored

Commits on Aug 7, 2025

chore(eval): Remove eval_infer_remote.sh script and related references (#10157)

xingyaoww
and
openhands-agent
authored

Commits on Jul 22, 2025

Evaluation: disable browser when NOT run_with_browsing (#9837)
li-boxuan
authored

Commits on Jul 16, 2025

Commits on Jul 15, 2025

Add README for terminal_bench evaluation harness (#9700)
li-boxuan
authored

Commits on Jul 10, 2025

feat(eval): loc acc evaluation (#8515)

authored

Commits on Jul 8, 2025

eval: remove gemini-specific swebench template (#9623)
xingyaoww
authored

Commits on Jun 25, 2025

[OH-Versa] Add remaining browsing & GAIA eval improvement (#9015)

authored

Commits on Jun 17, 2025

Commits on Jun 16, 2025

disable mcp in run_localize and install oh-aci[llama] for issue 9150 (#9151)
better629
authored

Commits on Jun 15, 2025

Commits on Jun 14, 2025