How It Works Β· See It Work Β· Skill Router Β· Install Β· Deep Dive Β· Skills Β· CLI Β· FAQ
From goal to shipped code β agents research, plan, and implement in parallel. Councils validate before and after. Every learning feeds the next session.
Coding agents get a blank context window every session. AgentOps is a toolbox of primitives β pick the ones you need, skip the ones you don't. Every skill works standalone. Swarm any of them for parallelism. Chain them into a pipeline when you want structure. Knowledge compounds between sessions automatically.
DevOps' Three Ways β applied to the agent loop as composable primitives:
- Flow (
/research,/plan,/crank,/swarm,/rpi): orchestration skills that move work through the system. Single-piece flow, minimizing context switches. Swarm parallelizes any skill; crank runs dependency-ordered waves; rpi chains the full pipeline. - Feedback (
/council,/vibe,/pre-mortem, hooks): shorten the feedback loop until defects can't survive it. Independent judges catch issues before code ships. Hooks make the rules unavoidable β validation gates, push blocking, standards injection. Problems found Friday don't wait until Monday. - Learning (
.agents/,aoCLI,/retro,/knowledge): stop rediscovering what you already know. Every session extracts learnings into an append-only ledger, scores them by freshness, and re-injects the best ones at next session start. Session 50 knows what session 1 learned the hard way.
/quickstart β Day 1: guided tour on your codebase (~10 min) β Not sure what to do? ββββββββββΊ /brainstorm β Have an idea of what you want? ββββββββββΊ /research β Ready to scope it cleanly? ββββββββββΊ /plan β /implement (small) Β· /crank (epic) β Build and ship β /vibe β /post-mortem β Validate and learn β /rpi "goal" β One command for the full flow Use one skill β validate a PR:
> /council validate this PR [council] 3 judges spawned (independent, no anchoring) [judge-1] PASS β token bucket implementation correct [judge-2] WARN β rate limiting missing on /login endpoint [judge-3] PASS β Redis integration follows middleware pattern Consensus: WARN β add rate limiting to /login before shipping The council verdict, your decisions, and the patterns used are automatically written to .agents/ β an append-only ledger. Nothing gets overwritten. Session ends, hooks extract learnings.
Knowledge compounds β three weeks later, different task, but your agent already knows:
> /research "retry backoff strategies" [inject] 3 prior learnings loaded (freshness-weighted): - Token bucket with Redis (established, high confidence) - Rate limit at middleware layer, not per-handler (pattern) - /login endpoint was missing rate limiting (decision) [research] Found prior art in your codebase + injected context Recommends: exponential backoff with jitter, reuse existing Redis client Session 5 didn't start from scratch β it started with what session 1 learned. Stale insights decay automatically.
Parallelize anything with /swarm:
> /swarm "research auth patterns, brainstorm rate limiting improvements" [swarm] 3 agents spawned β each gets fresh context [agent-1] /research auth β found JWT + session patterns, 2 prior learnings [agent-2] /research rate-limiting β found token bucket, middleware pattern [agent-3] /brainstorm improvements β 4 approaches ranked [swarm] Complete β artifacts in .agents/ Full pipeline β one command, walk away:
> /rpi "add retry backoff to rate limiter" [research] Found 3 prior learnings on rate limiting (injected) [plan] 2 issues, 1 wave β epic ag-0058 [pre-mortem] Council validates plan β PASS (knew about Redis choice) [crank] Parallel agents: Wave 1 ββ 2/2 [vibe] Council validates code β PASS [post-mortem] 2 new learnings β .agents/ [flywheel] Next: /rpi "add circuit breaker to external API calls"
AgentOps building AgentOps: completed `/crank` across 3 parallel epics (15 issues, 5 waves, 0 regressions).
More examples β /evolve, session continuity
Session continuity across compaction or restart:
> /handoff [handoff] Saved: 3 open issues, current branch, next action Continuation prompt written to .agents/handoffs/ --- next session --- > /recover [recover] Found in-progress epic ag-0058 (2/5 issues closed) Branch: feature/rate-limiter Next: /implement ag-0058.3 Goal-driven improvement loop:
> /evolve --max-cycles=5 [evolve] GOALS.yaml: 4 goals loaded [cycle-1] Measuring fitness... 2/4 passing Worst gap: test-pass-rate (weight: 10) /rpi "Improve test-pass-rate" β 3 issues, 2 waves Re-measure: 3/4 passing β [cycle-2] Worst gap: doc-coverage (weight: 7) /rpi "Improve doc-coverage" β 2 issues, 1 wave Re-measure: 4/4 passing β [cycle-3] All goals met. Checking harvested work... Picked: "add smoke test for /evolve" (from post-mortem) [teardown] /post-mortem β 5 learnings extracted Different developers, different setups β use what fits your workflow
The PR reviewer β uses one skill, nothing else:
> /council validate this PR Consensus: WARN β missing error handling in 2 locations That's it. No pipeline, no setup, no commitment. One command, actionable feedback.
The team lead β composes skills manually:
> /research "performance bottlenecks in the API layer" > /plan "optimize database queries identified in research" > /council validate the plan Picks skills as needed, stays in control of sequencing.
The solo dev β runs the full pipeline, walks away:
> /rpi "add user authentication" [3 phases run autonomously, learnings extracted] One command does research through post-mortem. Comes back to committed code.
The platform team β parallel agents, hands-free improvement:
> /swarm "run /rpi on each of these 3 epics" > /evolve --max-cycles=5 Swarms full pipelines in parallel. Evolve measures goals and fixes gaps in a loop.
Use this when you're not sure which skill to run.
What are you trying to do? β ββ "Not sure what to do yet" β ββ Generate options first ββββββΊ /brainstorm β ββ "I have an idea" β ββ Understand code + context βββΊ /research β ββ "I know what I want to build" β ββ Break it into issues ββββββββΊ /plan β ββ "Now build it" β ββ Small/single issue ββββββββββΊ /implement β ββ Multi-issue epic ββββββββββββΊ /crank <epic-id> β ββ Full flow in one command ββββΊ /rpi "goal" β ββ "Fix a bug" β ββ Know which file? βββββββββββΊ /implement <issue-id> β ββ Need to investigate? βββββββΊ /bug-hunt β ββ "Build a feature" β ββ Small (1-2 files) ββββββββββΊ /implement β ββ Medium (3-6 issues) ββββββββΊ /plan β /crank β ββ Large (7+ issues) ββββββββββΊ /rpi (full pipeline) β ββ "Validate something" β ββ Code ready to ship? ββββββββΊ /vibe β ββ Plan ready to build? βββββββΊ /pre-mortem β ββ Work ready to close? βββββββΊ /post-mortem β ββ Quick sanity check? ββββββββΊ /council --quick validate β ββ "Explore or research" β ββ Understand this codebase βββΊ /research β ββ Compare approaches βββββββββΊ /council research <topic> β ββ Generate ideas βββββββββββββΊ /brainstorm β ββ "Learn from past work" β ββ What do we know about X? βββΊ /knowledge <query> β ββ Save this insight ββββββββββΊ /learn "insight" β ββ Run a retrospective ββββββββΊ /retro β ββ "Parallelize work" β ββ Multiple independent tasks βΊ /swarm β ββ Full epic with waves βββββββΊ /crank <epic-id> β ββ "Ship a release" β ββ Changelog + tag βββββββββββΊ /release <version> β ββ "Session management" β ββ Where was I? βββββββββββββββΊ /status β ββ Save for next session ββββββΊ /handoff β ββ Recover after compaction βββΊ /recover β ββ "First time here" βββββββββββββΊ /quickstart Requirements
node18+ (fornpx skills) andgit- One supported runtime: Claude Code, Codex CLI, Cursor, or OpenCode
- Optional for
aoCLI install path shown below: Homebrew (brew)
# Claude Code, Codex CLI, Cursor (most users) npx skills@latest add boshu2/agentops --all -g # OpenCode curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bashWorks with: Claude Code Β· Codex CLI Β· Cursor Β· OpenCode β skills are portable across runtimes (/converter exports to native formats).
Then type /quickstart in your agent chat.
# Claude Code plugin (alternative) claude plugin add boshu2/agentopsnpx skills installs skills into your agent's global skills directory. The plugin path registers AgentOps as a Claude Code plugin instead β same skills, different integration point. Most users should start with npx skills.
The ao CLI β powers the knowledge flywheel
Skills work standalone. The ao CLI powers the automated learning loop β knowledge extraction, injection with freshness decay, maturity lifecycle, and progress gates. Install it when you want knowledge to compound between sessions.
brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops cd /path/to/your/repo ao init --hooks --fullThis installs 25+ hooks across core lifecycle events:
| Event | What happens |
|---|---|
| SessionStart | Extract from prior session, inject top learnings (freshness-weighted), check progress gates |
| SessionEnd | Mine transcript for knowledge, record session outcome, expire stale artifacts, evict dead knowledge |
| PreToolUse | Inject coding standards before edits, gate dangerous git ops, validate before push |
| PostToolUse | Advance progress ratchets, track citations |
| TaskCompleted | Validate task output against acceptance criteria |
| Stop/PreCompact | Close feedback loops, snapshot before compaction |
OpenCode β plugin + skills
Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md
Local-only. No telemetry. No cloud. No accounts.
| What | Where | Reversible? |
|---|---|---|
| Skills | Global skills dir (outside your repo; for Claude Code: ~/.claude/skills/) | npx skills@latest remove boshu2/agentops -g |
| Knowledge artifacts | .agents/ in your repo (git-ignored by default) | rm -rf .agents/ |
| Hook registration | .claude/settings.json | ao hooks uninstall or delete entries |
| Git push gate | Pre-push hook (optional, only with CLI) | AGENTOPS_HOOKS_DISABLED=1 |
Nothing modifies your source code. Nothing phones home. Everything is open source β audit it yourself.
Configuration β environment variables
All optional. AgentOps works out of the box with no configuration.
Council / validation:
| Variable | Default | What it does |
|---|---|---|
COUNCIL_TIMEOUT | 120 | Judge timeout in seconds |
COUNCIL_CLAUDE_MODEL | sonnet | Claude model for judges (opus for high-stakes) |
COUNCIL_CODEX_MODEL | (user's Codex default) | Override Codex model for --mixed |
COUNCIL_EXPLORER_MODEL | sonnet | Model for explorer sub-agents |
COUNCIL_EXPLORER_TIMEOUT | 60 | Explorer timeout in seconds |
COUNCIL_R2_TIMEOUT | 90 | Debate round 2 timeout in seconds |
Hooks:
| Variable | Default | What it does |
|---|---|---|
AGENTOPS_HOOKS_DISABLED | 0 | 1 to disable all hooks (kill switch) |
AGENTOPS_PRECOMPACT_DISABLED | 0 | 1 to disable pre-compaction snapshot |
AGENTOPS_TASK_VALIDATION_DISABLED | 0 | 1 to disable task validation gate |
AGENTOPS_SESSION_START_DISABLED | 0 | 1 to disable session-start hook |
AGENTOPS_EVICTION_DISABLED | 0 | 1 to disable knowledge eviction |
AGENTOPS_GITIGNORE_AUTO | 1 | 0 to skip auto-adding .agents/ to .gitignore |
AGENTOPS_WORKER | 0 | 1 to skip push gate (for worker agents) |
Full reference with examples and precedence rules: docs/ENV-VARS.md
Troubleshooting: docs/troubleshooting.md
Standard iterative development β research, plan, validate, build, review, learn β automated for agents that can't carry context between sessions.
This is DevOps thinking applied to agent work: the Three Ways as composable primitives.
- Flow: wave-based execution (
/crank) + workflow orchestration (/rpi) to keep work moving. - Feedback: shift-left validation (
/pre-mortem,/vibe,/council) plus optional gates/hooks to make feedback unavoidable. - Continual learning: post-mortems turn outcomes into reusable knowledge in
.agents/, so the next session starts smarter./flywheelmonitors health.
.agents/ is an append-only ledger with cache-like semantics. Nothing gets overwritten β every learning, council verdict, pattern, and decision is a new dated file. Freshness decay prunes what's stale. The cycle:
Session N ends β ao forge: mine transcript for learnings, decisions, patterns β ao maturity --expire: mark stale artifacts (freshness decay) β ao maturity --evict: archive what's decayed past threshold Session N+1 starts β ao inject --apply-decay: score all artifacts by recency, inject top-N within token budget β Agent starts with institutional knowledge, not a blank slate Write once, score by freshness, inject the best, prune the rest. If retrieval_rate Γ usage_rate stays above decay and scale friction, knowledge compounds. If not, growth stalls unless fresh input or stronger controls are added. The formal model is cache eviction with a decay function and limits-to-growth controls.
/rpi "goal" β βββ /research β /plan β /pre-mortem β /crank β /vibe β βΌ /post-mortem βββ validates what shipped βββ extracts learnings β .agents/ βββ suggests next /rpi command βββββ β /rpi "next goal" ββββββββββββββββββββ The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy /rpi command. Paste it, walk away.
Learnings pass quality gates (specificity, actionability, novelty) and land in tiered pools. Freshness decay ensures recent insights outweigh stale patterns.
Phase details β what each step does
-
/researchβ Explores your codebase. Produces a research artifact with findings and recommendations. -
/planβ Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a beads epic (git-native issue tracking). -
/pre-mortemβ Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries). -
/crankβ Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed.--test-firstfor spec-first TDD. -
/vibeβ Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3). -
/post-mortemβ Council validates the implementation. Retro extracts learnings. Suggests the next/rpicommand.
/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.
Phased RPI β fresh context per phase for larger goals
ao rpi phased "goal" runs each phase in its own session β no context bleed between phases.
ao rpi phased "add rate limiting" # Hands-free, fresh context per phase ao rpi phased "add auth" & # Run multiple in parallel (auto-worktrees) ao rpi phased --from=implementation "fix perf" # Resume at execution phase ao rpi status --watch # Monitor active phased runsUse /rpi when context fits in one session. Use ao rpi phased when it doesn't.
Goal-driven mode β /evolve with GOALS.yaml
Bootstrap with /goals generate β it scans your repo (PRODUCT.md, README, skills, tests) and proposes mechanically verifiable goals. Or write them by hand:
# GOALS.yaml version: 1 goals: - id: test-pass-rate description: "All tests pass" check: "make test" weight: 10Then /evolve measures them, picks the worst gap, runs /rpi to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally β you control when to push. Kill switch: echo "stop" > ~/.config/evolve/KILL
Maintain over time: /goals shows pass/fail status, /goals prune finds stale or broken checks.
References β science, systems theory, prior art
Built on Darr 1995 (decay rates), Sweller 1988 (cognitive load), Liu et al. 2023 (lost-in-the-middle), MemRL 2025 (RL for memory).
AgentOps concentrates on the high-leverage end of Meadows' hierarchy: information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.
Deep dive: docs/how-it-works.md β Brownian Ratchet, Ralph Wiggum Pattern, agent backends, hooks, context windowing.
Five pillars, one recursive shape. The same pattern β lead decomposes work, workers execute atomically, validation gates lock progress, next wave begins β repeats at every scale:
/implement ββ one worker, one issue, one verify cycle βββ /crank ββ waves of /implement (FIRE loop) βββ /rpi ββ research β plan β crank β validate β learn βββ /evolve ββ fitness-gated /rpi cycles Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave (Ralph Wiggum Pattern), never commit (lead-only), and communicate through the filesystem β not accumulated chat context. Parallel execution works because each unit of work is atomic: no shared mutable state with concurrent workers.
Validation is mechanical, not advisory. Multi-model councils judge before and after implementation. Hooks enforce gates β push blocked until /vibe passes, /crank blocked until /pre-mortem passes. The knowledge flywheel extracts learnings, scores them, and re-injects them at session start so each cycle compounds.
Full treatment: docs/ARCHITECTURE.md β all five pillars, operational invariants, component overview.
Every skill works alone. Compose them however you want.
Judgment β the foundation everything validates against:
| Skill | What it does |
|---|---|
/council | Independent judges (Claude + Codex) debate, surface disagreement, converge. --preset=security-audit, --perspectives, --debate for adversarial review |
/vibe | Code quality review β complexity analysis + council |
/pre-mortem | Validate plans before implementation β council simulates failures |
/post-mortem | Wrap up completed work β council validates + retro extracts learnings |
Execution β research, plan, build, ship:
| Skill | What it does |
|---|---|
/research | Deep codebase exploration β produces structured findings |
/plan | Decompose a goal into trackable issues with dependency waves |
/implement | Full lifecycle for one task β research, plan, build, validate, learn |
/crank | Parallel agents in dependency-ordered waves, fresh context per worker |
/swarm | Parallelize any skill β run research, brainstorms, implementations in parallel |
/rpi | Full pipeline: discovery (research + plan + pre-mortem) β implementation (crank) β validation (vibe + post-mortem) |
/evolve | Measure fitness goals, fix the worst gap, roll back regressions, loop |
Knowledge β the flywheel that makes sessions compound:
| Skill | What it does |
|---|---|
/knowledge | Query learnings, patterns, and decisions across .agents/ |
/learn | Manually capture a decision, pattern, or lesson |
/retro | Extract learnings from completed work |
/flywheel | Monitor knowledge health β velocity, staleness, pool depths |
Supporting skills:
| Onboarding | /quickstart, /using-agentops |
| Session | /handoff, /recover, /status |
| Traceability | /trace, /provenance |
| Product | /product, /goals, /release, /readme, /doc |
| Utility | /quickstart, /brainstorm, /bug-hunt, /complexity |
Full reference: docs/SKILLS.md
Cross-runtime orchestration β mix Claude, Codex, OpenCode
AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.
| Spawning Backend | How it works | Best for |
|---|---|---|
| Native teams | TeamCreate + SendMessage β built into Claude Code | Tight coordination, debate |
| Background tasks | Task(run_in_background=true) β last-resort fallback | When no team APIs available |
| Codex sub-agents | /codex-team β Claude orchestrates Codex workers | Cross-vendor validation |
| tmux + Agent Mail | /swarm --mode=distributed β full process isolation | Long-running work, crash recovery |
Distributed mode workers survive disconnects β each runs in its own tmux session with crash recovery. tmux attach to debug live.
Skills work standalone β no CLI required. The ao CLI adds two things: (1) the knowledge flywheel that makes sessions compound (extract, inject, decay, maturity), and (2) terminal-based RPI that runs without an active chat session. Each phase gets its own fresh context window, so large goals don't hit context limits.
ao rpi phased "add rate limiting" # 3 sessions: discover β build β validate ao rpi phased "fix auth bug" & # Run multiple in parallel (auto-worktrees) ao rpi phased --from=implementation "ag-058" # Resume at build phase ao rpi status --watch # Monitor active runsWalk away, come back to committed code + extracted learnings.
ao search "query" # Search knowledge across files and chat history ao demo # Interactive demoFull reference: CLI Commands
These are fellow experiments in making coding agents work. Use pieces from any of them.
| Alternative | What it does well | Where AgentOps focuses differently |
|---|---|---|
| GSD | Clean subagent spawning, fights context rot | Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions) |
| Compound Engineer | Knowledge compounding, structured loop | Multi-model councils and validation gates β independent judges debating before and after code ships |
docs/FAQ.md β comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.
Built on β Ralph Wiggum, Multiclaude, beads, CASS, MemRL
Ralph Wiggum (fresh context per agent) Β· Multiclaude (validation gates) Β· beads (git-native issues) Β· CASS (session search) Β· MemRL (cross-session memory)
Issue tracking β Beads / bd
Git-native issues in .beads/. bd onboard (setup) Β· bd ready (find work) Β· bd show <id> Β· bd close <id> Β· bd sync. More: AGENTS.md
See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.
Apache-2.0 Β· Docs Β· How It Works Β· FAQ Β· Glossary Β· Architecture Β· Configuration Β· CLI Reference Β· Changelog