Your model is not the bottleneck.
Your harness is.
if your agent only works when AGENTS.md keeps growing: you do not have alignment you have prompt debt fix: smaller root router harder verification gates repo-specific evals- ClawPond landing page:
https://clawpond.com/harness-engineering - Skill detail page:
https://clawpond.com/skills/harness-engineering - GitHub repository:
https://github.com/LeoFanKm/harness-engineering - Install command:
claw install harness-engineering
- Skill entry:
SKILL.md - Repo audit script:
scripts/harness_audit.py - Eval scaffolding:
scripts/scaffold_repo_eval.py - Eval run scaffold:
scripts/scaffold_eval_run.py - Eval aggregation:
scripts/aggregate_repo_eval.py - Prompt kit:
references/prompt-kit.md - Eval playbook:
references/eval-playbook.md - Scoring rubric:
references/scoring-rubric.md - Research map:
references/research-map.md - Landing copy:
docs/landing-copy.md - Release notes:
docs/release-notes.md
Most agent failure still comes from the same four boring causes:
- the repo is illegible
- verification is weak
- old paths still grow
- autonomy exceeds the fences
This skill packages the counter-pattern into something a repo can actually enforce:
- shrink the root instruction surface into a router
- pin durable truth into named docs and commands
- fence bad paths with scripts, CI, and compatibility boundaries
- prove the change with meaningful verification and repo-specific evals
- Root context shrank from
229lines to66 - Historical benchmark:
92.0review-loop vs47.3single-pass - Release chain now guards skill data, landing pages, schema, sitemap, and smoke coverage together
- Read
SKILL.md. - Run
python scripts/harness_audit.py <repo-root>. - Use
references/prompt-kit.mdto choose the smallest intervention. - Use
references/eval-playbook.mdwhen you need repo-specific evals.
- WeChat group: scan the QR code below for the OpenClaw / Harness Engineering community.
