GitHub - paradigmxyz/evmbench: A benchmark and harness for finding and exploiting smart contract bugs

evmbench cover

evmbench is a benchmark and agent harness for finding and exploiting smart contract bugs.

How it works | Security | Key services | Repo layout | Quickstart (local dev)

This repository contains a companion interface to the evmbench detect evaluation (code). For reference, we include the evaluation code as a pinned submodule at frontier-evals/.

Upload contract source code, select an agent, and receive a structured vulnerability report rendered in the UI.

How it works

Architecture

Frontend (Next.js) │ ├─ POST /v1/jobs/start ───► Backend API (FastAPI, port 1337) │ ├─► PostgreSQL (job state) ├─ GET /v1/jobs/{id} ├─► Secrets Service (port 8081) │ └─► RabbitMQ (job queue) └─ GET /v1/jobs/history │ ▼ Instancer (consumer) │ ┌─────────┴──────────┐ ▼ ▼ Docker backend K8s backend (optional) │ │ └────────┬───────────┘ ▼ Worker container ├─► Secrets Service (fetch bundle) ├─► (optional) OAI Proxy (port 8084) ──► OpenAI API └─► Results Service (port 8083)

End-to-end flow

User uploads a zip of contract files via the frontend. The UI sends the archive, selected model key, and (optionally) an OpenAI API key to /v1/jobs/start.
The backend creates a job record in Postgres, stores a secret bundle in the Secrets Service, and publishes a message to RabbitMQ.
The Instancer consumes the job and starts a worker (Docker locally; Kubernetes backend is optional).
The worker fetches its bundle from the Secrets Service, unpacks the uploaded zip to audit/, then runs Codex in "detect-only" mode:
- prompt: backend/worker_runner/detect.md (copied to $HOME/AGENTS.md inside the container)
- model map: backend/worker_runner/model_map.json (maps UI model keys to Codex model IDs)
- command wrapper: backend/worker_runner/run_codex_detect.sh
The agent writes submission/audit.md. The worker validates that the output contains parseable JSON with {"vulnerabilities": [...]} and then uploads it to the Results Service.
The frontend polls job status and renders the report with file navigation and annotations.

Security

evmbench runs an LLM-driven agent against uploaded, untrusted code. Treat the worker runtime (filesystem, logs, outputs) as an untrusted environment.

See SECURITY.md for the full trust model and operational guidance.

OpenAI credential handling:

Direct BYOK (default): worker receives a plaintext OpenAI key (OPENAI_API_KEY / CODEX_API_KEY).
Proxy-token mode (optional): worker receives an opaque token and routes requests through oai_proxy (plaintext key stays outside the worker).

Enabling proxy-token mode:

cd backend cp .env.example .env # set BACKEND_OAI_KEY_MODE=proxy and OAI_PROXY_AES_KEY=... docker compose --profile proxy up -d --build

Operational note: worker runtime is bounded by default; override the max audit runtime with EVM_BENCH_CODEX_TIMEOUT_SECONDS (default: 10800 seconds).

Key services

Service	Default port	Role
`backend`	1337	Main API: job submission, status, history, auth
`secretsvc`	8081	Stores and serves per-job secret bundles (zip + key material)
`resultsvc`	8083	Receives worker results, validates/parses, persists to DB
`oai_proxy`	8084	Optional OpenAI proxy for proxy-token mode
`instancer`	(n/a)	RabbitMQ consumer that starts worker containers/pods
`worker`	(n/a)	Executes the detect-only agent and uploads results
Postgres	5432	Job state persistence
RabbitMQ	5672	Job queue

Repo layout

. ├── README.md ├── SECURITY.md ├── LICENSE ├── frontend/ Next.js UI (upload zip, select model, view results) ├── frontier-evals/ Pinned upstream reference (git submodule) ├── backend/ │ ├── api/ Main FastAPI API (jobs, auth, integration) │ ├── instancer/ RabbitMQ consumer; starts workers (Docker/K8s) │ ├── secretsvc/ Bundle storage service │ ├── resultsvc/ Results ingestion + persistence │ ├── oai_proxy/ Optional OpenAI proxy (proxy-token mode) │ ├── prunner/ Optional cleanup of stale workers │ ├── worker_runner/ Detect prompt + model map + Codex runner script │ ├── docker/ │ │ ├── base/ Base image: codex, foundry, slither, node, tools │ │ ├── backend/ Backend services image │ │ └── worker/ Worker image + entrypoint │ └── compose.yml Full stack (DB/MQ + services) └── deploy/ Optional deployment scripts/examples

Quickstart (local dev)

Ensure Docker and Bun are available.

Build the base and worker images first (required before starting the stack):

cd backend docker build -t evmbench/base:latest -f docker/base/Dockerfile . docker build -t evmbench/worker:latest -f docker/worker/Dockerfile .

Start backend stack (API + dependencies):

cp .env.example .env # For local dev, the placeholder secrets in .env.example are sufficient. # For internet-exposed deployments, replace them with strong values. docker compose up -d --build

Start frontend dev server:

cd frontend bun install bun dev

Open:

http://127.0.0.1:3000 (frontend)
http://127.0.0.1:1337/v1/integration/frontend (backend config endpoint)

Acknowledgments

Thank you to many folks on the OtterSec team for support, particularly with building the frontend: es3n1n, jktrn, TrixterTheTux, sahuang

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
deploy		deploy
frontend		frontend
frontier-evals @ e4d27fe		frontier-evals @ e4d27fe
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How it works

Architecture

End-to-end flow

Security

Key services

Repo layout

Quickstart (local dev)

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How it works

Architecture

End-to-end flow

Security

Key services

Repo layout

Quickstart (local dev)

Acknowledgments

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages