Skip to main content
20 events
when toggle format what by license comment
Aug 5, 2022 at 0:19 comment added user2357112 Tried using codecs.decode(myString, 'unicode-escape'), since codecs.decode accepts Unicode input directly. Turns out that still fails on input outside the ASCII range, in the exact same way Apalala pointed out the current version of the answer already fails.
Mar 22, 2022 at 3:24 comment added metatoaster @DonovanBaarda no, there are no multi-byte utf-8 representation of any unicode codepoints > 127 that produce bytes within the ascii range (0-127), as all multi-byte characters are in the range 128-255 (i.e. 0x80 - 0xff) because the designers of unicode and utf-8 understood this exact issue. In other words, no, it is impossible to for str.encode('utf-8') to produce the bytes b'\x5c' (0x5c) from anything other than the unicode codepoint U+005C.
Mar 21, 2022 at 11:30 comment added Donovan Baarda @metatoaster But isn't your solution still a bit fragile, since s.encode('utf-8') encodes the output in utf-8 and decode('unicode_escape') assumes the input is latin-1? Is it possible that the utf-8 encoding introduces some backslash bytes? It would probably work fine most of the time, but if the input string included a unicode character that when utf-8 encoded included a 0x5c latin-1 backslash character, that backslash would get escaped, which would then probably break the final decode('utf-8').
Dec 17, 2020 at 1:38 comment added Glen Whitney Just wanted to note that metatoaster is correct, unicode_escape does need a latin-1 coded byte sequence, but it's not necessary to make two roundtrips between strings and byte sequences (see alternate answer for python3).
Dec 16, 2020 at 22:26 review Suggested edits
Dec 17, 2020 at 0:13
Jul 10, 2018 at 2:41 comment added OpenAI stole this from rspeer @metatoaster Oh, I see! Yes, that actually does work. Nice.
Jul 6, 2018 at 5:19 comment added metatoaster @rspeer the whole string when being decoded as unicode_escape is bytes, which means it doesn't have any encoding, but unicode_escape is a valid codec which would produce the same bytes as unicode encoded in latin1 from the input string. For ease of illustration please look at this example and see how that actually works through every single step (to ease the effort from having to manually try it on your end). Hence I said "redo the encode/decode bit".
Jul 6, 2018 at 3:39 comment added OpenAI stole this from rspeer @metatoaster As stated in my answer, that doesn't work if your string contains any characters that aren't in latin-1.
May 25, 2018 at 9:01 comment added metatoaster Since latin1 is assumed by unicode_escape, redo the encode/decode bit, e.g. s.encode('utf-8').decode('unicode_escape').encode('latin1').decode('utf8')
Mar 28, 2016 at 3:26 comment added Christian Aichinger Agreed with @Apalala: this is not good enough. Check out rseeper's answer below for a complete solution that works in Python2 and 3!
Jul 1, 2014 at 19:04 comment added Apalala This solution is not good enough because it doesn't handle the case in which there are legit unicode characters in the original string. If you try: >>> print("juancarlo\\tañez".encode('utf-8').decode('unicode_escape')) You get: juancarlo añez
May 14, 2013 at 8:44 comment added Chris In Python 2.7, myStr.decode('unicode_escape') seems better than myStr.decode('string_escape'), because it will also unescape unicode \udddd escape sequences into actual unicode characters. For example, r"\u2014").decode('unicode_escape') yields u"\u2014". string_escape, in contrast, leaves unicode escapes untouched. Though note that (at least in my locale) while I can put non-ASCII unicode escapes in myStr, I can't put actual non-ASCII characters in myStr, or decode will give me "UnicodeEncodeError: 'ascii' codec can't encode character" problems.
Feb 13, 2013 at 16:22 review Suggested edits
Feb 13, 2013 at 16:27
Feb 17, 2012 at 9:59 comment added Ning Sun @dln385 Does it work with non-ascii characters? I have some non-ascii chars with \\t. In python2, string-escape just works for that. But in python3, the codec is removed. And the unicode-escape just escapes all non-ascii bytes and breaks my encoding.
Oct 26, 2010 at 6:29 history edited Jerub CC BY-SA 2.5
added 97 characters in body
Oct 26, 2010 at 6:06 vote accept dln385
Oct 26, 2010 at 6:06 comment added dln385 In Python 3, the command needs to be print(bytes(myString, "utf-8").decode("unicode_escape"))
Oct 26, 2010 at 5:44 comment added dln385 @Nas Banov The documentation does make a small mention about that: Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec.
Oct 26, 2010 at 5:18 comment added Nas Banov hands down, the best solution! btw, by docs it should be "string_escape" (with underscore) but for some reason accepts anything in the pattern 'string escape', 'string@escape" and whatnot... basically 'string\W+escape'
Oct 26, 2010 at 5:01 history answered Jerub CC BY-SA 2.5