evmbench is a benchmark and agent harness for finding and exploiting smart contract bugs.
How it works | Security | Key services | Repo layout | Quickstart (local dev)
This repository contains a companion interface to the evmbench detect evaluation (code). For reference, we include the evaluation code as a pinned submodule at frontier-evals/.
Upload contract source code, select an agent, and receive a structured vulnerability report rendered in the UI.
Frontend (Next.js) │ ├─ POST /v1/jobs/start ───► Backend API (FastAPI, port 1337) │ ├─► PostgreSQL (job state) ├─ GET /v1/jobs/{id} ├─► Secrets Service (port 8081) │ └─► RabbitMQ (job queue) └─ GET /v1/jobs/history │ ▼ Instancer (consumer) │ ┌─────────┴──────────┐ ▼ ▼ Docker backend K8s backend (optional) │ │ └────────┬───────────┘ ▼ Worker container ├─► Secrets Service (fetch bundle) ├─► (optional) OAI Proxy (port 8084) ──► OpenAI API └─► Results Service (port 8083) - User uploads a zip of contract files via the frontend. The UI sends the archive, selected model key, and (optionally) an OpenAI API key to
/v1/jobs/start. - The backend creates a job record in Postgres, stores a secret bundle in the Secrets Service, and publishes a message to RabbitMQ.
- The Instancer consumes the job and starts a worker (Docker locally; Kubernetes backend is optional).
- The worker fetches its bundle from the Secrets Service, unpacks the uploaded zip to
audit/, then runs Codex in "detect-only" mode:- prompt:
backend/worker_runner/detect.md(copied to$HOME/AGENTS.mdinside the container) - model map:
backend/worker_runner/model_map.json(maps UI model keys to Codex model IDs) - command wrapper:
backend/worker_runner/run_codex_detect.sh
- prompt:
- The agent writes
submission/audit.md. The worker validates that the output contains parseable JSON with{"vulnerabilities": [...]}and then uploads it to the Results Service. - The frontend polls job status and renders the report with file navigation and annotations.
evmbench runs an LLM-driven agent against uploaded, untrusted code. Treat the worker runtime (filesystem, logs, outputs) as an untrusted environment.
See SECURITY.md for the full trust model and operational guidance.
OpenAI credential handling:
- Direct BYOK (default): worker receives a plaintext OpenAI key (
OPENAI_API_KEY/CODEX_API_KEY). - Proxy-token mode (optional): worker receives an opaque token and routes requests through
oai_proxy(plaintext key stays outside the worker).
Enabling proxy-token mode:
cd backend cp .env.example .env # set BACKEND_OAI_KEY_MODE=proxy and OAI_PROXY_AES_KEY=... docker compose --profile proxy up -d --buildOperational note: worker runtime is bounded by default; override the max audit runtime with EVM_BENCH_CODEX_TIMEOUT_SECONDS (default: 10800 seconds).
| Service | Default port | Role |
|---|---|---|
backend | 1337 | Main API: job submission, status, history, auth |
secretsvc | 8081 | Stores and serves per-job secret bundles (zip + key material) |
resultsvc | 8083 | Receives worker results, validates/parses, persists to DB |
oai_proxy | 8084 | Optional OpenAI proxy for proxy-token mode |
instancer | (n/a) | RabbitMQ consumer that starts worker containers/pods |
worker | (n/a) | Executes the detect-only agent and uploads results |
| Postgres | 5432 | Job state persistence |
| RabbitMQ | 5672 | Job queue |
. ├── README.md ├── SECURITY.md ├── LICENSE ├── frontend/ Next.js UI (upload zip, select model, view results) ├── frontier-evals/ Pinned upstream reference (git submodule) ├── backend/ │ ├── api/ Main FastAPI API (jobs, auth, integration) │ ├── instancer/ RabbitMQ consumer; starts workers (Docker/K8s) │ ├── secretsvc/ Bundle storage service │ ├── resultsvc/ Results ingestion + persistence │ ├── oai_proxy/ Optional OpenAI proxy (proxy-token mode) │ ├── prunner/ Optional cleanup of stale workers │ ├── worker_runner/ Detect prompt + model map + Codex runner script │ ├── docker/ │ │ ├── base/ Base image: codex, foundry, slither, node, tools │ │ ├── backend/ Backend services image │ │ └── worker/ Worker image + entrypoint │ └── compose.yml Full stack (DB/MQ + services) └── deploy/ Optional deployment scripts/examples Ensure Docker and Bun are available.
Build the base and worker images first (required before starting the stack):
cd backend docker build -t evmbench/base:latest -f docker/base/Dockerfile . docker build -t evmbench/worker:latest -f docker/worker/Dockerfile .Start backend stack (API + dependencies):
cp .env.example .env # For local dev, the placeholder secrets in .env.example are sufficient. # For internet-exposed deployments, replace them with strong values. docker compose up -d --buildStart frontend dev server:
cd frontend bun install bun devOpen:
http://127.0.0.1:3000(frontend)http://127.0.0.1:1337/v1/integration/frontend(backend config endpoint)
Thank you to many folks on the OtterSec team for support, particularly with building the frontend: es3n1n, jktrn, TrixterTheTux, sahuang