0

Using python3 and I've got a string which displayed as bytes

strategyName=\xe7\x99\xbe\xe5\xba\xa6

I need to change it into readable chinese letter through decode

orig=b'strategyName=\xe7\x99\xbe\xe5\xba\xa6' result=orig.decode('UTF-8') print() 

which shows like this and it is what I want

strategyName=百度

But if I save it in another string,it works different

str0='strategyName=\xe7\x99\xbe\xe5\xba\xa6' result_byte=str0.encode('UTF-8') result_str=result_byte.decode('UTF-8') print(result_str) 

strategyName=ç¾åº¦é£é©ç­ç¥

Please help me about why this happening,and how can I fix it.
Thanks a lot

2
  • You have a typo: orig is a bytes, while str0 is a str. Add a b in front of the data for str0 and decode it. Commented Jan 11, 2019 at 3:26
  • Put it another way result_byte != orig because the individual bytes in orig are combined to produce the Unicode characters, but each escape sequence in a string is a separate character already. Commented Jan 11, 2019 at 3:28

1 Answer 1

2

Your problem is using a str literal when you're trying to store the UTF-8 encoded bytes of your string. You should just use the bytes literal, but if that str form is necessary, the correct approach is to encode in latin-1 (which is a 1-1 converter for all ordinals below 256 to the matching byte value) to get the bytes with utf-8 encoded data, then decode as utf-8:

str0 = 'strategyName=\xe7\x99\xbe\xe5\xba\xa6' result_byte = str0.encode('latin-1') # Only changed line result_str = result_byte.decode('UTF-8') print(result_str) 

Of course, the other approach could be to just type the Unicode escapes you wanted in the first place instead of byte level escapes that correspond to a UTF-8 encoding:

result_str = 'strategyName=\u767e\u5ea6' 

No rigmarole needed.

Sign up to request clarification or add additional context in comments.

1 Comment

TIL about the deal with Latin-1. +1

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.