fix: recover crashed agents by recreating missing tmux windows#362
Open
whitmo wants to merge 1 commit intodlorenc:mainfrom
Open
fix: recover crashed agents by recreating missing tmux windows#362whitmo wants to merge 1 commit intodlorenc:mainfrom
whitmo wants to merge 1 commit intodlorenc:mainfrom
Conversation
Previously, when an agent's tmux window disappeared (crash, manual kill, etc.), both the restart command and health check loop would fail to recover it - the restart command returned an error saying the window needed to be recreated, and the health check immediately marked the agent for cleanup. Now: - handleRestartAgent recreates the tmux window when missing, then restarts Claude in it, preserving session context via --resume - Health check loop attempts window recreation + restart for persistent agents before falling back to cleanup - Both paths validate the agent's worktree still exists before attempting recovery - restartAgent validates worktree existence early to fail fast with a clear error message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Triage ReviewPriority: P1 (Agent restart - roadmap item) Changes:
Recommendation: Merge before #364. Good defensive improvements. |
This was referenced Mar 7, 2026
Author
Local CI Verification (2026-03-12)
CI Status: No GitHub Actions checks are running — this is expected for first-time fork PRs. GitHub requires a maintainer to approve workflow runs for PRs from forks. Branch is rebased on |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
handleRestartAgentrefused to proceed when the tmux window was missing, returning an unhelpful error instead of recreating ithandleRestartAgentandrestartAgentto fail fast with clear errors when the agent's working directory no longer existsWhat was broken
When an agent crashed hard enough to lose its tmux window (e.g., OOM kill, manual
tmux kill-window, session corruption):multiclaude agent restart <name>would fail with "tmux window does not exist - the agent may need to be recreated" — but there was no way to "recreate" itWhat this fixes
handleRestartAgentnow recreates the missing tmux window (pointed at the agent's worktree) before restarting Claude with--resumeTest plan
TestHandleRestartAgentTableDrivenpasses (all 8 cases)go test ./...)multiclaude agent restartrecovers it🤖 Generated with Claude Code