Revisions to What is the difference between a string and a byte string?

Fixed the weird syntax highlighting (as a result, the diff looks more extensive than it really is - use view "Side-by-side Markdown" to compare).

Source Link

edited Apr 27, 2022 at 23:36

Peter Mortensen

31.4k
22
110
134

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1'

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001'

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the UTF-8 text encoding was used, you can follow the algorithm for decoding UTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š'

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1'

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001'

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the UTF-8 text encoding was used, you can follow the algorithm for decoding UTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š'

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1'

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001'

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the UTF-8 text encoding was used, you can follow the algorithm for decoding UTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š'

Active reading [<https://en.wikipedia.org/wiki/UTF-8>].

Source Link

edited Apr 27, 2022 at 23:29

Peter Mortensen

31.4k
22
110
134

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1'

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001'

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the utf-8UTF-8 text encoding was used, you can follow the algorithm for decoding utfUTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š'

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1'

For the purpose of this example let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001'

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the utf-8 text encoding was used, you can follow the algorithm for decoding utf-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š'

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1'

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001'

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the UTF-8 text encoding was used, you can follow the algorithm for decoding UTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š'

Source Link

answered Apr 28, 2020 at 13:06

Jeyekomon

3.6k
3
33
43

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1'

For the purpose of this example let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001'

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the utf-8 text encoding was used, you can follow the algorithm for decoding utf-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š'

Collectives™ on Stack Overflow

Return to Answer