fix(ice): preserve nomination on redundant pair replacement#908
fix(ice): preserve nomination on redundant pair replacement#908zRedShift wants to merge 3 commits intoalgesten:mainfrom
Conversation
`copy_nominated_and_success_state` dropped `Attempt` nomination state when replacing a pair, causing permanent nomination loss and lingering in Checking state until STUN retransmits exhaust. Reset `Attempt` to `Nominated` so the next binding cycle can recover.
src/ice/agent.rs Outdated
| /// binding cycle can re-establish it. Before this fix, `Attempt` was dropped, causing | ||
| /// nomination loss → Connected→Checking linger → connection slot exhaustion. | ||
| #[test] | ||
| fn pair_replacement_preserves_nomination_in_attempt_state() { |
There was a problem hiding this comment.
Can we write this test such that only two agents interact with each other? I'd like to see in what circumstances this problem can get triggered just be sending back and forth STUN bindings and the timings of adding new candidates.
There was a problem hiding this comment.
Added a two-agent regression that drives the real STUN exchange and injects the signalled candidate while the controlled side is in Attempt; it asserts we stay connected and don’t fall back to Checking. Kept the lower-level pair-state test alongside it.
There was a problem hiding this comment.
Thank you! It is verbose but I guess that is expected when we need to create such a specific state. Thank you for adding that.
From my perspective, we don't need the low-level test then but I'll let @algesten be the judge of that.
There was a problem hiding this comment.
Just confirming: The high-level test fails without your patch?
There was a problem hiding this comment.
@thomaseizinger I defer to you. You're much more in-tune with this code than me at this point :)
There was a problem hiding this comment.
Just confirming: The high-level test fails without your patch?
Yes, both tests fail without the patch.
I don't mind removing the low-level test and the cfg(test) function it requires - just LMK.
There was a problem hiding this comment.
Yeah cool. Lets remove the low-level test then please!
Summary
When a redundant pair replacement happens while nomination is in
Attempt, the replacement pair loses nomination becausecopy_nominated_and_success_stateskipsAttempt.Attemptcannot be copied directly because the in-flight transaction belongs to the old pair. Falling back toNoneloses the nomination state entirely. Resetting toNominatedis the smallest safe fallback: the next binding cycle creates a fresh nominated binding attempt, which can then promote the replacement pair back toSuccess.Impact
For a controlled agent, losing nomination during pair replacement can move an otherwise connected session back to
Checkingwhile candidate pairs are still considered possible. Since the checks themselves may continue succeeding, the agent can remain inCheckinguntil retransmits expire instead of recovering nomination promptly.Fix
When replacing a redundant pair whose nomination state is
Attempt, reset the replacement pair toNominated.This preserves the intent to nominate without trying to reuse the old pair’s transaction state. The next binding cycle then issues a fresh nominated attempt with a valid transaction, which can promote to
Success.Test
Regression test
pair_replacement_preserves_nomination_in_attempt_stateverifies:Nominatednew_attempt()transitions it toAttemptrecord_binding_response()recovers it toSuccess