Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

5
  • When I posted this, I did not realize there was a duplicate question for which this exact answer had already been given: stackoverflow.com/a/57192592/5583443 Commented Feb 12, 2021 at 4:00
  • The important thing here is not just that latin-1 is used, but that non-latin-1 characters are turned into escape sequences via the 'backslashreplace' error handling. This just happens to give the exact format that the .decode step is trying to replace. So this works with, for example, myString='日本\u8a9e', correctly giving 日本語. However, it doesn't handle the truly nasty cases described in my answer. Commented Aug 6, 2022 at 1:07
  • (On the other hand, it certainly can be argued that input with a single trailing backslash should fail...) Commented Aug 6, 2022 at 1:10
  • Is it really always latin-1, or does it depend on the default encoding for your particular version of Python? Is this true even on Linux for example? Commented Nov 30, 2023 at 22:32
  • Well, in the table in the python docs that I link to above, in the 'unicode_escape' entry, it states "Decode from Latin-1 source code." So that seems pretty clear/definitive to me... Commented Dec 1, 2023 at 5:50