- Notifications
You must be signed in to change notification settings - Fork 130
Description
I've noticed a possible bug where having an ObservableResource that periodically calls self.updated_state with a new message, i.e.
self.updated_state(aiocoap.Message(code=Code.VALID))... can cause problems if there are multiple observers for the resource.
(Note: in my case the two observers are the same client, which is a bit odd and I plan to fix that on the client side - but I think a version of the bug exists even when the observers are different clients, and it is also possible for a lossy network to result in a redundant observer living longer even if the client tries to cancel it.)
The trivial effect is seeing the "Message ID set on to-be-sent message" in the log each time updated_state is called:
Mon Dec 8 08:46:45 2025 DEBUG Sending message <aiocoap.Message at 0x43aca890: CON 2.03 Valid (MID 38964, token 500c) remote <UDP6EndpointAddress [fdb8:60ca:a3f7:1:a6d4:41b8:946e:7cb8]:8163 (locally fdb8:60ca:a3f7:1:d85b:80bf:b1d9: 1083%wpan0)>, 1 option(s)> Mon Dec 8 08:46:45 2025 DEBUG Exchange added, message ID: 38964. Mon Dec 8 08:46:45 2025 WARNING Message ID set on to-be-sent message, this is probably unintended; clearing it. However the bigger problem happens later if one of the clients goes away and an outgoing observation message times out:
Tue Dec 9 17:48:50 2025 DEBUG Sending message <aiocoap.Message at 0x37c93610: CON 2.03 Valid (MID 23763, token 82f4) remote <UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)>, 1 option(s)> Tue Dec 9 17:48:50 2025 DEBUG Exchange added, message ID: 23763. Tue Dec 9 17:48:50 2025 WARNING Message ID set on to-be-sent message, this is probably unintended; clearing it. Tue Dec 9 17:48:50 2025 DEBUG Message to <UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)> put into backlog Tue Dec 9 17:48:52 2025 ERROR Exception in callback MessageManager._schedule_retransmit.<locals>.retr() at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:260 Tue Dec 9 17:48:52 2025 handle: <TimerHandle when=13051.910706122799 MessageManager._schedule_retransmit.<locals>.retr() at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:260 created at /usr/lib/python3.9/site-packages/aiocoap/messagemanager.py:267> Tue Dec 9 17:48:52 2025 source_traceback: Object created at (most recent call last): Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/pipe.py", line 172, in add_response Tue Dec 9 17:48:52 2025 self._add_event(self.Event(response, None, is_last)) Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/pipe.py", line 160, in _add_event Tue Dec 9 17:48:52 2025 keep_calling = cb(event) Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/tokenmanager.py", line 132, in on_event Tue Dec 9 17:48:52 2025 self.token_interface.send_message( Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 419, in send_message Tue Dec 9 17:48:52 2025 self._send_initially(message, messageerror_monitor) Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 428, in _send_initially Tue Dec 9 17:48:52 2025 self._add_exchange(message, messageerror_monitor) Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 206, in _add_exchange Tue Dec 9 17:48:52 2025 next_retransmission = self._schedule_retransmit(message, timeout, 0) Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 267, in _schedule_retransmit Tue Dec 9 17:48:52 2025 return self.loop.call_later(timeout, retr) Tue Dec 9 17:48:52 2025 Traceback (most recent call last): Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 266, in retr Tue Dec 9 17:48:52 2025 self._retransmit(message, timeout, retransmission_counter) Tue Dec 9 17:48:52 2025 File "/usr/lib/python3.9/site-packages/aiocoap/messagemanager.py", line 273, in _retransmit Tue Dec 9 17:48:52 2025 messageerror_monitor, next_retransmission = self._active_exchanges.pop(key) Tue Dec 9 17:48:52 2025 KeyError: (<UDP6EndpointAddress [fd80:9d9c:eb7c:1:fd6d:5321:80b7:b618]:28164 (locally fd80:9d9c:eb7c:1:b1aa:ef14:6599:8971%wpan0)>, 23764) ... I think the root cause here is that the same message object is being sent multiple times with different IDs - each time the MessageManager.send_message() clears the message ID (resulting in the log line) and then sets a new message id on the same object.
Then if a retransmit happens for the first recipient, the message id has been rewritten for a later recipient so the key for self._active_exchanges is wrong and the KeyError shown above triggers. (Possibly with different clients it's not a KeyError but still the retransmit will be incorrect as the remote will have also changed on the message.)
I've observed the cascade impact of the KeyError is that entire remote is stalled and the Observer keeps adding messages to the backlog which are never sent, so it becomes a zombie Observe client...
I think the fix may be simple, will submit a PR shortly.
Install info
# python3 -m aiocoap.cli.defaults Python version: 3.9.15 (main, Oct 23 2022, 20:48:04) [GCC 8.4.0] aiocoap version: 0.4.7 Modules missing for subsystems: dtls: missing DTLSSocket oscore: missing cryptography, ge25519 linkheader: everything there prettyprint: missing termcolor, pygments Python platform: linux Default server transports: tcpserver:tcpclient:tlsserver:tlsclient:udp6 Selected server transports: tcpserver:tcpclient:tlsserver:tlsclient:udp6 Default client transports: tcpclient:tlsclient:udp6 Selected client transports: tcpclient:tlsclient:udp6 SO_REUSEPORT available (default, selected): True, True (OpenWRT router, running old Python unfortunately!)