Reduce RM initialization timeout from 2.0s to 1.5s by Vanjoseluis · Pull Request #831 · ros-controls/gz_ros2_control

Vanjoseluis · 2026-04-04T23:15:18Z

This PR replaces the previous conservative 2.0 s Resource Manager initialization timeout with a measured and reproducible value of 1.5 seconds.
The new value is based on an extensive experimental study designed to identify the minimum stable timeout under Gazebo.

This change reduces test runtime by 25% while maintaining full stability.

Motivation
Issue #801 showed that a too small timeout (0.2 s) leads to incorrect controller behavior and large deviations in expected joint positions. The goal of this study was to:

Determine the minimum stable timeout
Reduce test execution time
Eliminate flaky behavior caused by Gazebo initialization jitter
Replace a conservative value with an empirically justified one

Methodology
The pendulum_effort_test was used as the primary benchmark because it is the most timing‑sensitive test in the suite.
All other tests (pendulum_position_test, gripper_mimic_joint_position_test, gripper_mimic_joint_effort_test...) were also validated with the final value (1.5 s).

Initial Bisection Results

Timeout (s)	Result
2.0	PASS (baseline)
1.1	PASS (11/11)
0.65	PASS (10/11)
0.425	FAIL (3/4, 0/1, 0/1)
0.2	FAIL (issue #801)

Extended jitter analysis
Further testing revealed that Gazebo introduces significant initialization jitter.
Values that initially appeared stable (e.g., 1.1 s) were not consistently reproducible.

Additional runs:

1.0 s → 8/9 (not stable)

1.2 s → unstable

1.5 s → 30/30 PASS, fully stable with all tests

CONCLUSION
The RM initialization timeout is updated to 1.5 seconds, which:

Maintains full stability under Gazebo jitter
Reduces the previous timeout by 25%
Removes the arbitrary nature of the previous conservative 2.0 s value

This value is supported by reproducible experimental evidence (30/30 runs).

… stability study

Vanjoseluis · 2026-04-05T07:57:18Z

If needed, we could explore using different RM initialization timeouts depending on the test or hardware.
During the experiments, all tests except the pendulum ones were stable with significantly lower timeout values (e.g., 0.2 s).
This PR keeps a unified timeout for simplicity, but the distinction may be useful in future work.

Long‑term, it might be interesting to explore whether Gazebo could be kept alive across tests with a proper reset mechanism.
This could avoid repeated RM initialization and significantly reduce CI time.
I’m not sure whether the current ros2_control and Gazebo plugin architecture supports this, but it might be worth discussing.

Vanjoseluis · 2026-04-05T17:04:40Z

It may also be that physics starts advancing before the RM is fully initialized, so a larger timeout is needed to let the system settle.

christophfroehlich

I suppose this is dependent on the system load (CPU), and can get flaky on the CI runners. have you tested the same with higher CPU load? I use this to max out 15 of my 16 cores for example stress-ng --cpu 15 --vm 1 --vm-bytes 3G --vm-keep

Vanjoseluis · 2026-04-05T21:23:43Z

I suppose this is dependent on the system load (CPU), and can get flaky on the CI runners. have you tested the same with higher CPU load? I use this to max out 15 of my 16 cores for example stress-ng --cpu 15 --vm 1 --vm-bytes 3G --vm-keep

I tried running the pendulum test under extreme load using stress-ng (15 CPU hogs + 3 GB VM pressure). Under these conditions Gazebo becomes systematically unstable: most runs fail due to the assertion, and a couple of them due to missing joint_state messages.

Update:
I also tested the previous 2.0 s timeout under the same extreme stress‑ng conditions, and it fails consistently as well (mostly due to missing joint_state messages). I even tried 5.0 s with the same result.

This shows that stress‑ng overload breaks Gazebo initialization regardless of the timeout value, so it’s not a meaningful criterion for choosing the timeout.

Under normal and CI‑like load, the 1.5 s timeout behaves reliably.

Reduce RM initialization timeout from 2.0s to 1.5s based on empirical…

2442102

… stability study

Vanjoseluis requested a review from ahcorde as a code owner April 4, 2026 23:15

christophfroehlich reviewed Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce RM initialization timeout from 2.0s to 1.5s#831

Reduce RM initialization timeout from 2.0s to 1.5s#831
Vanjoseluis wants to merge 1 commit intoros-controls:rollingfrom
Vanjoseluis:rm-init-timeout-study

Vanjoseluis commented Apr 4, 2026

Vanjoseluis commented Apr 5, 2026 •

edited

Loading

Vanjoseluis commented Apr 5, 2026

christophfroehlich left a comment

Vanjoseluis commented Apr 5, 2026 •

edited

Loading

Labels

2 participants

Conversation

Vanjoseluis commented Apr 4, 2026

Vanjoseluis commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vanjoseluis commented Apr 5, 2026

christophfroehlich left a comment

Choose a reason for hiding this comment

Vanjoseluis commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

2 participants

Vanjoseluis commented Apr 5, 2026 •

edited

Loading

Vanjoseluis commented Apr 5, 2026 •

edited

Loading