Why does bytearray function in Python turn one byte into two bytes?

Question

I'm attempting to create a data encoder in Python. I'm using my own unique underlying symmetric algorithm to encode a single 8-bit byte to another 8-bit byte and then decode it using the same algorithm.

I'm using Python's bytearray function to turn strings into bytes. However I'm running into this issue: The hexadecimal xAB can be represented in binary as 1010 1011. Yet when I use byte array on the string representation ("\xAB") I get:

>>> byte = bytearray("\xAB", "utf-8") >>> print(byte) bytearray(b'\xc2\xab')

Clearly the string is represented in the single byte of \xAB, but why is the other byte \xC2 being prepended to the byte array? I'm using UTF-8 to encode the data since that is Python's default, but should I be using a different encoding? How can I get the bytearray to contain only the 8 bit byte needed to represent xAB?

Ture Pålsson · Accepted Answer · 2021-07-12 18:22:18Z

"\xAB" is a string consisting of the single Unicode character U+00AB. You then convert it to a byte array, using the UTF-8 encoding. But in UTF-8, the character U+00AB is encoded as two bytes — C2, AB. That the second byte happens to be the same as the input byte in this case is a coincidence; it will not always be the case.

If you want to deal with byte arrays, you are probably better off leaving strings out of it, as strings always bring encoding headaches with them.

Yaakov Bressler · Accepted Answer · 2021-07-12 18:00:37Z

The erroneous byte seem to be coming from the escape character in your string. When encoding from string to byte, I recommend using python's new notation:

>> byte = bytearray(b"xAB") >> print(byte) bytearray(b'xAB')

Also, to debug your code, consider reversing the encoding to see what python is seeing: (clearly not the correct value)

>>> byte = bytearray("\xAB", "utf-8") >>> byte.decode() '«'

Using the changes provided above, the correct value is returned:

>>> byte = bytearray(b'xAB') >>> byte.decode() 'xAB'

Collectives™ on Stack Overflow

Why does bytearray function in Python turn one byte into two bytes?

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related