Skip to content

fix(cache): prevent Connection closed errors during SWR background revalidation#2854

Merged
tlgimenes merged 6 commits intomainfrom
tlgimenes/fix-cache-revalidation
Mar 25, 2026
Merged

fix(cache): prevent Connection closed errors during SWR background revalidation#2854
tlgimenes merged 6 commits intomainfrom
tlgimenes/fix-cache-revalidation

Conversation

@tlgimenes
Copy link
Contributor

@tlgimenes tlgimenes commented Mar 24, 2026

What is this contribution about?

Fixes McpError: MCP error -32000: Connection closed errors that occur during background cache revalidation for HTTP/SSE MCP connections.

Root cause: fetchWithCache implements stale-while-revalidate (SWR) — on cache hit, it returns cached data immediately and fires a background fetchLive() to refresh the cache. For HTTP/SSE connections, fetchLive() creates clients via the per-request pool (ctx.getOrCreateClient). The pool's connections get closed before the background revalidation completes, causing the error.

Fix (two layers):

  1. Keep pool alive: Add pendingRevalidations array to MeshContext. Background revalidation promises are registered via an onRevalidation callback. A Hono middleware fire-and-forgets awaiting these promises after the handler returns, keeping ctx (and its client pool) alive via closure while revalidations complete. 30s timeout prevents hung revalidations.
  2. Defense-in-depth: Suppress "Connection closed" errors in the revalidation catch handler (consistent with existing isMethodNotFound handling), since SWR revalidation is best-effort.

No pool disposal is added — SSE/streaming proxy connections depend on pool clients remaining open after the handler returns.

Screenshots/Demonstration

N/A — backend-only change.

How to Test

  1. Deploy to a staging environment with HTTP MCP connections configured
  2. Make requests to a Virtual MCP endpoint that triggers SWR cache hits (connections with cached tool/resource/prompt lists)
  3. Verify no more [fetchWithCache] ... background revalidation FAILED: McpError: MCP error -32000: Connection closed errors in logs
  4. Run bun test apps/mesh/src/mcp-clients/mcp-list-cache.test.ts — all 27 tests should pass
  5. Verify SSE/streaming proxy connections still work correctly

Migration Notes

N/A — no database or configuration changes.

Review Checklist

  • PR title is clear and descriptive
  • Changes are tested and working
  • Documentation is updated (if needed)
  • No breaking changes

Summary by cubic

Fixes MCP error -32000 "Connection closed" during SWR background revalidation for HTTP/SSE MCP connections by keeping the per-request client pool alive and treating revalidation as best-effort. Also skips redundant revalidations in the connections list to reduce noise and background work.

  • Bug Fixes
    • Added pendingRevalidations to MeshContext and an onRevalidation callback in fetchWithCache; all call sites register revalidation promises, and hono middleware fire-and-forgets a 30s await after the handler returns, now wrapped in try/finally to keep the client pool alive.
    • fetchWithCache now suppresses "Connection closed" errors during revalidation and continues handling MethodNotFound by caching empty lists.
    • Skipped SWR revalidation in COLLECTION_CONNECTIONS_LIST; the GET tool already revalidates per-connection, preventing unnecessary fan-out and potential connection closures.

Written for commit be97752. Summary will update on new commits.

@github-actions
Copy link
Contributor

🧪 Benchmark

Should we run the Virtual MCP strategy benchmark for this PR?

React with 👍 to run the benchmark.

Reaction Action
👍 Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 24, 2026

Release Options

Suggested: Patch (2.204.4) — based on fix: prefix

React with an emoji to override the release type:

Reaction Type Next Version
👍 Prerelease 2.204.4-alpha.1
🎉 Patch 2.204.4
❤️ Minor 2.205.0
🚀 Major 3.0.0

Current version: 2.204.3

Note: If multiple reactions exist, the smallest bump wins. If no reactions, the suggested bump is used (default: minor).

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 13 files

Prompt for AI agents (unresolved issues)
 Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately. <file name="apps/mesh/src/api/app.ts"> <violation number="1" location="apps/mesh/src/api/app.ts:989"> P1: Wrap `await next()` in `try/finally` so pending revalidation handling still runs when downstream handlers throw.</violation> </file> 

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

tlgimenes and others added 4 commits March 25, 2026 14:37
Add a `pendingRevalidations: Promise<void>[]` field to MeshContext to track SWR background revalidation promises. This allows middleware to keep the request context alive while revalidations complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ed errors Add optional `onRevalidation` callback to `fetchWithCache` so callers can track background revalidation promises. Add `isConnectionClosed` predicate to silently handle MCP connection closed errors during background revalidation (defense-in-depth for SWR best-effort semantics). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass onRevalidation callback at all 4 production call sites to register background revalidation promises on ctx.pendingRevalidations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…losed errors After the route handler returns, fire-and-forget await pending SWR revalidations (with 30s timeout) to keep ctx and its client pool alive via closure. No pool disposal — SSE/streaming connections depend on pool clients remaining open. Fixes background revalidation "Connection closed" errors for HTTP/SSE MCP connections by ensuring the per-request client pool stays alive while revalidations complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tlgimenes tlgimenes force-pushed the tlgimenes/fix-cache-revalidation branch from 2c79fd2 to a7f088d Compare March 25, 2026 17:39
tlgimenes and others added 2 commits March 25, 2026 14:43
Ensures pending SWR revalidations run even when downstream handlers throw. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LIST tool was triggering background SWR revalidations for every connection's tools via fetchWithCache. This is unnecessary since the GET tool already handles per-connection revalidation. Removing it reduces unnecessary background work and potential connection closed errors.
@tlgimenes tlgimenes merged commit 2e3e52b into main Mar 25, 2026
22 of 23 checks passed
@tlgimenes tlgimenes deleted the tlgimenes/fix-cache-revalidation branch March 25, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants