A vendor-agnostic planning methodology for AI agents that think before they code.
SPECTRA's evaluation framework is designed with peer-reviewable rigor: 3-layer evaluation architecture, pre-registered hypotheses, triple assessment (human expert + LLM-as-Judge + automated structural checks), and statistical methodology (Nβ₯30, Cohen's d effect sizes, Bonferroni-corrected significance).
Instrument development is complete. Data collection is underway. Results will be published in
docs/benchmarks/as they become available.π€ Want to accelerate this? Every real-world case study submission is a benchmark data point.
Every major AI coding tool β Cursor, Claude Code, GitHub Copilot β has independently converged on the same architecture: separate planning from execution. The evidence is strong:
- Self-Planning Code Generation: 25.4% improvement in Pass@1 (Jiang et al., ASE 2024)
- PlanSearch: diversity at the plan level nearly doubles performance (Wang et al., 2024)
- Reflexion: structured self-critique achieves 91% Pass@1 on HumanEval (Shinn et al., NeurIPS 2023)
Yet every implementation is locked inside a proprietary tool. Switch editors, switch LLMs, build your own agents β you start from zero.
SPECTRA is the methodology extracted from the pattern. It's the playbook, not the player.
Scope β Pattern β Explore β Construct β Test β Refine β Assemble
A cognitive architecture that codifies how the best commercial tools think before they act, distilled into a portable methodology that works with any LLM, any IDE, any stack.
βββ CLARIFY (disambiguate + gather context) βββ βΌ β S β P β E β C β T β R ββ¬β A (confidence β₯85%) β ββ R (refine, max 3) β βΌ β βββ PERSIST (artifact storage) + ADAPT ββββββββ | Capability | Raw ReAct | Cursor Plan | Claude Code Plan | SPECTRA |
|---|---|---|---|---|
| Plan/execute separation | β | β | β | β |
| Structured clarification | β | Partial | Partial | β (protocol) |
| Plan diversity (3β5 hypotheses) | β | β | β | β |
| Weighted scoring rubric | β | β | β | β (7-dim) |
| Multi-layer verification | β | β | Reasoning only | β (6 layers) |
| Reflexion-style refinement | β | β | β | β |
| Adaptive replanning | β | β | β | β |
| Failure taxonomy | β | β | β | β (8 modes) |
| Theoretical foundations | β | β | β | β (formal) |
| Vendor-agnostic | β | β (Cursor) | β (Anthropic) | β |
| Open methodology | β | β | β | β |
| Stack adaptation tooling | β | β | β | β |
- Not a framework or library. No
pip install. It's a methodology β like Agile or TOGAF, but for AI planning agents. - Not vendor-locked. Works with Claude, GPT, Gemini, Llama, Mistral, or whatever comes next.
- Not an agent. SPECTRA describes how an agent should think about planning. Your agent, your implementation.
Run the analyzer at your project root. It detects your stack and generates an LLM prompt to create project-specific conventions:
# Direct run curl -sL https://raw.githubusercontent.com/Rynaro/SPECTRA/main/tools/spectra-init.sh | bash # Or clone first git clone https://github.com/Rynaro/SPECTRA.git cd your-project/ bash ../SPECTRA/tools/spectra-init.shThis produces two files:
spectra-project-profile.mdβ Detected languages, frameworks, patterns, directory structurespectra-adaptation-prompt.mdβ Paste into any LLM to generatespectra-conventions.mdfor your stack
| Start Here | Then | Deep Dives |
|---|---|---|
| SPECTRA.md | scoring.md | THEORY.md |
| Full cognitive architecture | Rubrics, matrices, validation | Decision theory, information theory, cognitive science |
Organized by what you need to do:
SPECTRA/ β βββ π docs/methodology/ USE: Learn and apply the methodology β βββ SPECTRA.md Core cognitive architecture (start here) β βββ SKILL.md Quick-reference routing card β βββ scoring.md All rubrics, matrices, validation criteria β βββ templates.md Copy-paste output formats per phase β βββ π¬ docs/research/ USE: Understand the evidence base β βββ REFERENCES.md 15+ papers + commercial tool analysis β βββ SYNTHESIS.md Evidence β design decision mapping β βββ THEORY.md Formal theoretical foundations (PhD-level) β βββ π docs/benchmarks/ USE: Evaluate methodology effectiveness β βββ README.md Evaluation framework + status β βββ π‘ examples/ USE: See SPECTRA in action β βββ rails-player-import.md Full Rails example (origin stack) β βββ generic-api-feature.md Node.js/TypeScript example β βββ anti-patterns.md What NOT to do (with corrections) β βββ π§ tools/ USE: Adapt SPECTRA to your project β βββ spectra-init.sh Project analyzer + LLM prompt generator β βββ .github/ β βββ CONTRIBUTING.md How to contribute β βββ ISSUE_TEMPLATE/ β βββ case_study_submission.md β βββ methodology_feedback.md β βββ README.md β You are here βββ CHANGELOG.md Version history βββ LICENSE CC BY-SA 4.0 SPECTRA was developed and battle-tested on Ruby on Rails applications at a production SaaS company. The methodology originally used Rails conventions (FlowObjects, Repositories, ViewComponents) and company-specific agent names.
For this open-source release:
- All proprietary references replaced with generic capability classes
- Domain vocabulary made stack-agnostic
spectra-init.shcreated to auto-adapt to any project- Examples provided for both Rails and Node.js stacks
- Theoretical foundations formalized with decision theory, information theory, and cognitive science
The cognitive architecture is stack-independent. Only the vocabulary in your stories changes. The CLARIFY β SPECTRA β PERSIST cycle works identically whether you're building Rails APIs, React frontends, Go microservices, or Rust systems.
Every design choice traces to evidence. Full mapping in SYNTHESIS.md.
| Decision | Why | Evidence |
|---|---|---|
| Read-only during planning | Forces problem-space thinking | Universal in Cursor, Claude Code, Copilot |
| Clarify before decompose | Prevents 40%+ wasted effort | Commercial tool analysis |
| 3β5 hypotheses | Plan diversity >> code diversity | PlanSearch (~2x); Miller's Law bounds |
| 7-dimension rubric | Structured evaluation beats intuition | Extends ToT value assessment |
| 6-layer verification | Catch flaws before execution | 10x cheaper than executing bad plans |
| Reflexion refinement | Diagnose β explain β prescribe | 91% Pass@1 (Shinn et al.) |
| Persistent artifacts | Plans survive context windows | MD + YAML + JSON triple format |
| Adaptive replanning | 3-step lookahead, not full restart | ADaPT-style efficiency |
| Failure taxonomy | Targeted remediation, not generic "refine" | 8 modes with diagnostic signals |
| Diminishing returns detection | Stop refining when marginal gain < 0.3 | Prevents token waste in cycles |
- AI agent builders β Implementing planning in your agent? SPECTRA is the architecture.
- Engineering leads β Evaluating AI planning tools? SPECTRA is the benchmark.
- Prompt engineers β Designing planning prompts? SPECTRA's templates are battle-tested.
- Open-source maintainers β Building the next Aider, Cline, or Continue? SPECTRA is the methodology layer.
- Researchers β Studying AI planning? THEORY.md provides formal foundations.
Contributions should strengthen the methodology. See CONTRIBUTING.md.
Most valuable: Case studies with real-world results. Every case study becomes a benchmark data point.
CC BY-SA 4.0. Use SPECTRA in your products, adapt it for your team, teach it in workshops. Credit the source and share improvements under the same terms.
SPECTRA v4 synthesizes insights from Plan-and-Solve Prompting (Wang et al.), PlanSearch (Wang et al.), Tree of Thoughts (Yao et al.), Reflexion (Shinn et al.), ADaPT (Prasad et al.), and patterns from Cursor, Claude Code, GitHub Copilot, Windsurf, Aider, Cline, Roo Code, and LangGraph. Theoretical foundations draw from Kahneman, Miller, Shannon, Sweller, and Elster. Full references in REFERENCES.md.
SPECTRA v4.2.0 β Strategic Specification through Deliberate Reasoning