Skip to main content
Fixed the weird syntax highlighting (as a result, the diff looks more extensive than it really is - use view "Side-by-side Markdown" to compare).
Source Link
Peter Mortensen
  • 31.4k
  • 22
  • 110
  • 134

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1' 

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001' 

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the UTF-8 text encoding was used, you can follow the algorithm for decoding UTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001 
11000101 10100001 ^^^^^ ^^^^^^ 00101 100001 

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š' 

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1' 

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001' 

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the UTF-8 text encoding was used, you can follow the algorithm for decoding UTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001 

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š' 

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1' 

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001' 

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the UTF-8 text encoding was used, you can follow the algorithm for decoding UTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001 

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š' 
Active reading [<https://en.wikipedia.org/wiki/UTF-8>].
Source Link
Peter Mortensen
  • 31.4k
  • 22
  • 110
  • 134

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1' 

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001' 

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the utf-8UTF-8 text encoding was used, you can follow the algorithm for decoding utfUTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001 

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š' 

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1' 

For the purpose of this example let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001' 

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the utf-8 text encoding was used, you can follow the algorithm for decoding utf-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001 

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š' 

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1' 

For the purpose of this example, let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001' 

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the UTF-8 text encoding was used, you can follow the algorithm for decoding UTF-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001 

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š' 
Source Link
Jeyekomon
  • 3.6k
  • 3
  • 33
  • 43

Let's have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8') b'\xc5\xa1' 

For the purpose of this example let's display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16)) '0b1100010110100001' 

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the utf-8 text encoding was used, you can follow the algorithm for decoding utf-8 and acquire the original string:

11000101 10100001 ^^^^^ ^^^^^^ 00101 100001 

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2)) 'š'