0

I have a bytes type data like this:

b"6D4B8BD5" 

the data is from a chinese character using unicode-escape code. it can be generate like this:

'测试'.encode('unicode-escape') 

result:

b'\\u6d4b\\u8bd5' 

how can I convert b"6D4B8BD5" to b'\u6d4b\u8bd5' or how can I convert b"6D4B8BD5" to '测试'?

3 Answers 3

1

unhexlify is a function to get the bytes, then decode with the right encoding:

>>> from binascii import unhexlify >>> s = b'6D4B8BD5' >>> unhexlify(s).decode('utf-16be') '测试' 
Sign up to request clarification or add additional context in comments.

Comments

0
>>> str = b"6D4B8BD5" >>> chr(int(str[0:4], 16)) '测' >>> chr(int(str[4:8], 16)) '试' 

Comments

0

The working solution which returns the correct result and works for any string :)

Python 3.x

def convert(chars): if isinstance(chars, bytes): chars = chars.decode('ascii') chars = [''.join(c) for c in zip(chars[::4], chars[1::4], chars[2::4], chars[3::4])] return "".join([chr(int(c, 16)) for c in chars]) print(convert(b"6D4B8BD5")) +++++++ #> python test123.py 测试 

Second solution without using lists & etc. Easier and faster.

def convert(chars): if isinstance(chars, bytes): chars = chars.decode('ascii') result = '' for i in range(len(chars) // 4): result += chr(int(chars[4 * i:4 * (i + 1)], 16)) return result print(convert(b"6D4B8BD5")) ++++++++ #> python test123.py 测试 

2 Comments

Could you explain a little bit the reasoning behind this? Thanks!
here we split 4 following hex codes characters into int16 (0-65535) values and calculate real characters using docs.python.org/3/library/functions.html#chr based on the int16 values. That's because Chinese uses utf-16 (en.wikipedia.org/wiki/UTF-16)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.