Skip to content

fix: use errors='replace' in Frame.__str__() for partial UTF-8 frames (fixes #1695)#1704

Open
naarob wants to merge 1 commit intopython-websockets:mainfrom
naarob:main
Open

fix: use errors='replace' in Frame.__str__() for partial UTF-8 frames (fixes #1695)#1704
naarob wants to merge 1 commit intopython-websockets:mainfrom
naarob:main

Conversation

@naarob
Copy link

@naarob naarob commented Mar 26, 2026

Fixes UnicodeDecodeError when DEBUG logging is enabled and a large text message is fragmented at byte boundaries. See issue #1695 for full details.

data = repr(bytes(self.data).decode(errors="replace"))

9 new tests. 79 upstream pass. 0 regressions.

…python-websockets#1695) Frame.__str__() decoded OP_TEXT frame data with a bare .decode(), which raises UnicodeDecodeError when the frame ends in the middle of a multi-byte UTF-8 sequence. This happens when the websockets library itself fragments a large text message at byte boundaries (not at character boundaries) for continuations frames (fin=False), e.g. Japanese, Chinese, or emoji text. When DEBUG logging is enabled, the UnicodeDecodeError propagated and caused the connection to close with code 1007 (INVALID_DATA), even though the message was valid. The data itself was fine — only the logging was broken. Fix: add errors='replace' to the .decode() call in Frame.__str__(). This replaces incomplete sequences with U+FFFD (replacement character), making the log entry human-readable while never crashing the connection. Tests: 9 new tests covering partial Japanese, partial emoji, complete frames, ASCII, binary, and ping frames. 79 upstream tests unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant