Skip to content

fix(ice): preserve nomination on redundant pair replacement#908

Open
zRedShift wants to merge 3 commits intoalgesten:mainfrom
zRedShift:fix/preserve-nomination-on-pair-replacement
Open

fix(ice): preserve nomination on redundant pair replacement#908
zRedShift wants to merge 3 commits intoalgesten:mainfrom
zRedShift:fix/preserve-nomination-on-pair-replacement

Conversation

@zRedShift
Copy link

Summary

When a redundant pair replacement happens while nomination is in Attempt, the replacement pair loses nomination because copy_nominated_and_success_state skips Attempt.

Attempt cannot be copied directly because the in-flight transaction belongs to the old pair. Falling back to None loses the nomination state entirely. Resetting to Nominated is the smallest safe fallback: the next binding cycle creates a fresh nominated binding attempt, which can then promote the replacement pair back to Success.

Impact

For a controlled agent, losing nomination during pair replacement can move an otherwise connected session back to Checking while candidate pairs are still considered possible. Since the checks themselves may continue succeeding, the agent can remain in Checking until retransmits expire instead of recovering nomination promptly.

Fix

When replacing a redundant pair whose nomination state is Attempt, reset the replacement pair to Nominated.

This preserves the intent to nominate without trying to reuse the old pair’s transaction state. The next binding cycle then issues a fresh nominated attempt with a valid transaction, which can promote to Success.

Test

Regression test pair_replacement_preserves_nomination_in_attempt_state verifies:

  • replacement resets nomination to Nominated
  • new_attempt() transitions it to Attempt
  • record_binding_response() recovers it to Success
`copy_nominated_and_success_state` dropped `Attempt` nomination state when replacing a pair, causing permanent nomination loss and lingering in Checking state until STUN retransmits exhaust. Reset `Attempt` to `Nominated` so the next binding cycle can recover.
src/ice/agent.rs Outdated
/// binding cycle can re-establish it. Before this fix, `Attempt` was dropped, causing
/// nomination loss → Connected→Checking linger → connection slot exhaustion.
#[test]
fn pair_replacement_preserves_nomination_in_attempt_state() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write this test such that only two agents interact with each other? I'd like to see in what circumstances this problem can get triggered just be sending back and forth STUN bindings and the timings of adding new candidates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a two-agent regression that drives the real STUN exchange and injects the signalled candidate while the controlled side is in Attempt; it asserts we stay connected and don’t fall back to Checking. Kept the lower-level pair-state test alongside it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It is verbose but I guess that is expected when we need to create such a specific state. Thank you for adding that.

From my perspective, we don't need the low-level test then but I'll let @algesten be the judge of that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming: The high-level test fails without your patch?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomaseizinger I defer to you. You're much more in-tune with this code than me at this point :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming: The high-level test fails without your patch?

Yes, both tests fail without the patch.
I don't mind removing the low-level test and the cfg(test) function it requires - just LMK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah cool. Lets remove the low-level test then please!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Owner

@algesten algesten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants