This repository contains an early prototype exploration of AgentBound concepts. The theory, metrics, and implementation have evolved significantly since this initial work. For current research status, please contact the author.
AgentBound is a design-time static analysis tool for AI agent systems. Think of it like a linter for agent graphs: it computes an agentic entropy score from an agent architecture and flags risky patterns such as long generative-to-generative chains, loops, and missing validators.
By running AgentBound on your system design, you can:
- Identify structural risks before finalizing code.
- Compare design variants to see how changes affect stability.
- Export diagrams and metrics for design reviews.
Skeptical? Need empirical validation?
The
validation/directory contains experiments showing that AgentBound’s entropy scores track real brittleness in lightweight simulation tests.
SeeVALIDATION.mdfor full details.
Run one of the included demos to see AgentBound in action:
- Customer Support Workflow demo - shows how small design changes (adding a Doc Filter or Validator) shift entropy.
- Dummy LangGraph Supervisor demo - shows how the LangGraph Supervisor pattern changes entropy under different designs.
Alternatively, view and interpret example output.
Learn more about agentic entropy, how AgentBound scores a graph, and what AgentBound does and doesn't claim.
- Complete the general setup.
- Use AgentBound to analyze a single graph or compare two graphs.
- Interpret the output.
Agentic entropy is a structural signal that reflects the potential unpredictability of an agentic system before it is deployed. High entropy often indicates higher design risk, such as:
- Unpredictable hand-offs between generative components
- Missing validators or control points
- Loops or long generative chains
Agentic entropy is an area of active research. The scoring methodology will continue to be refined.
- Design-time signal only. Based on graph structure and heuristics. Helps you catch risky shapes before final implementation.
- Useful for architecture reviews. Justify adding validators, reducing LLM-to-LLM chains, or splitting flows.
- NOT a runtime guarantee. Doesn’t predict accuracy or KPIs. Complements—rather than replaces—post-deploy evals and monitoring.
AgentBound extracts the following from the graph (excluding aux nodes):
- G = count of generative nodes
- D = count of deterministic nodes
- gg = count of edges from generative → generative
Node kind inference (can be overridden with a kind map):
- Generative if id/label matches
llm|gpt|model|generate|writer|assistant|agent|supervisor(case-insensitive) - Deterministic otherwise
- Aux if id starts with
__(excluded from scoring)
Coupling factor:
coupling = 1.0 + (sqrt(gg) / max(1, G)) if G > 0 coupling = 1.0 if G == 0Entropy score (V0):
entropy = (G / max(1, G + D)) * coupling + 0.1 * ggRisk levels:
< 0.30→ Low< 0.60→ Moderate< 0.90→ High≥ 0.90→ Very High
Optional resilience (if results JSON is provided):
-
Requires
baseline.fail_rateandperturbed.fail_rate -
Resilience index:
if both fail rates ~0 → 1.0 else → 1 - (perturbed / max(eps, baseline)), bounded [0,1]
-
Quadrant classification:
- Low entropy + Low resilience → Fragile
- High entropy + Low resilience → Chaotic Fragility
- Low entropy + High resilience → Robust
- High entropy + High resilience → Antifragile
Outputs:
- PNG diagram with generative→generative edges highlighted in red
- JSON report with entropy metrics, and optional resilience/quadrant
Two demos are included:
-
Customer Support Workflow
Three variants of a customer support pipeline. See how risky LLM-to-LLM edges raise entropy and how Doc Filters/Validators change outcomes. -
Dummy LangGraph Supervisor
Two variants of LangGraph’s Supervisor pattern. Compare how additional deterministic anchors shift entropy from Very High → Moderate.
-
Clone the repository:
git clone https://github.com/ElPaisano/agent-bound.git
-
Navigate to the repository:
cd agent-bound -
(Optional) Create and activate a virtual environment:
python -m venv .venv && source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
The basic usage pattern is:
python agentbound.py <graph_json> [results_json] [kind_map_json] [output_path]
Next, learn how to analyze a single graph and/or compare two graphs.
python agentbound.py path/to/graph.jsonOutputs: PNG diagram + JSON metrics.
(Optional) Specify an output location
By default, output is written to
agent-bound\out. To write the output to a custom location, specify the optional[output_path].For example,
python agentbound.py path/to/graph.json "" path/to/kind_map.json /examples/customer_support_agent/outwill create all outputs in/examples/customer_support_agent/out.
Next, learn how to interpret single graph analysis.
python agentbound_compare.py graph_A.json graph_B.jsonOutputs: side-by-side PNG comparison + JSON diff.
Next, learn how to interpret comparison output.
Before interpreting results, make sure you understand the scoring methodology.
AgentBound produces a JSON summary for each graph. For example:
{ "graph_json": "out/langgraph_supervisor.json", "entropy_score": 2.067, "entropy_level": "Very High", "generative_nodes": 3, "deterministic_nodes": 0, "gen_to_gen_edges": 4, "coupling_factor": 1.667 }Key fields:
entropy_score→ overall risk metric (higher = more drift / brittleness).entropy_level→ plain-English bucket (Low → Very High).generative_nodes (G)= stochastic components (LLMs, generative tools).deterministic_nodes (D)= anchors (validators, retrieval, DB calls).gen_to_gen_edges= direct LLM→LLM handoffs (amplify error).coupling_factor= increases as LLMs depend on each other.
Example: 3 G, 0 D, 4 LLM→LLM edges → Very High entropy. Design tip: add deterministic anchors or reduce LLM handoffs.
AgentBound can also diff two graphs to show the effect of design changes:
"A": { ... "entropy_level": "Very High" }, "B": { ... "entropy_level": "Moderate" }, "delta": { "entropy": -1.67, "D": +3 }Interpretation:
- B adds deterministic anchors and removes LLM→LLM edges.
- Entropy drops from Very High → Moderate.
- If entropy is high: add validators, retrieval, schema checks; shorten chains.
- If comparing designs: prefer variants that reduce entropy without blocking functionality.
The validation/ directory contains experiments that test AgentBound’s entropy scores against brittleness metrics from a lightweight simulation harness.
- Harness: runs toy graph families (chains, forks, loops, etc.) and records failure/retry statistics.
- Integration:
compute_entropy.pymerges these results with AgentBound’s entropy estimates. - Analysis: plots (with optional confidence intervals) visualize the relationship between entropy and brittleness.
For full details and usage, see validation/VALIDATION.md.
This repository contains an early prototype exploration of AgentBound concepts. The theory, metrics, and implementation have evolved significantly since this initial work. For current research status, please contact the author.
Open an issue with your graph JSON and expected vs. actual results.
This project is licensed under the PolyForm Noncommercial License 1.0.0. You may use, copy, and modify the software for noncommercial purposes only. For commercial use, please contact the author.

