Given a random byte (i.e. not only numbers/characters!), I need to convert it to a string and then back to the inital byte without loosing information. This seems like a basic task, but I ran in to the following problems:
Assuming:
rnd_bytes = b'w\x12\x96\xb8' len(rnd_bytes) prints: 4
Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.
my_str = rnd_bytes.decode('utf-8' , 'backslashreplace') Now, I have the string. I want to convert it back to exactly the original byte (size 4!):
According to python ressources and this answer, there are different possibilities:
conv_bytes = bytes(my_str, 'utf-8') conv_bytes = my_str.encode('utf-8') But len(conv_bytes) returns 10.
I tried to analyse the outcome:
>>> repr(rnd_bytes) "b'w\\x12\\x96\\xb8'" >>> repr(my_str) "'w\\x12\\\\x96\\\\xb8'" >>> repr(conv_bytes) "b'w\\x12\\\\x96\\\\xb8'" It would make sense to replace '\\\\'. my_str.replace('\\\\','\\') doesn't change anything. Probably, because four backslashes represent only two. So, my_str.replace('\\','\') would find the '\\\\', but leads to
SyntaxError: EOL while scanning string literal
due to the last argument '\'. This had been discussed here, where the following suggestion came up:
>>> my_str2=my_str.encode('utf_8').decode('unicode_escape') >>> repr(my_str2) "'w\\x12\\x96¸'" This replaces the '\\\\' but seems to add / change some other characters:
>>> conv_bytes2 = my_str2.encode('utf8') >>> len(conv_bytes2) 6 >>> repr(conv_bytes2) "b'w\\x12\\xc2\\x96\\xc2\\xb8'" There must be a prober way to convert a (complex) byte to a string and back. How can I achieve that?