11

For example:

t = str.encode(msg) print(t) 

I am getting double slashes, like this:

b'\\xda\\xad\\x94\\xb4\\x0bg\\x92]R\\x9a1y\\x9d\\xed\\x04\\xd5\\x8e+\\x07\\xf8\\x03\\x1bm\\xd6\\x96\\x10\\xca80\\xe26\\x8a 

But, I would like to get the result as:

b'\xda\xad\x94\xb4\x0bg\x92]R\x9a1y\x9d\xed\x04\xd5\x8e+\x07\xf8\x03\x1bm\xd6\x96\x10\xca80\xe26\x8a' 

Any help would be appreciated.

0

4 Answers 4

21

Utilizing Python Text Encodings

Python's text encodings will allow you get your desired result by simply encoding and decoding.

# I have the string shortened for presentation your_string = "\\xda\\xad\\x94" your_string.encode().decode('unicode_escape').encode("raw_unicode_escape") 

What is done above can be explained in three simple steps:

  1. Encode the string in order to turn it into a bytes object (this allows you to begin the process of removing those pesky backslash escape sequences)
  2. Decode the bytes object into a string with the unicode_escape codec (this unescapes the backslashes)
  3. Encode the object with raw_unicode_escape (this will make the string into a bytes object but, as designed, will not escape the backslashes)

Multiple Backslash Escape Sequences

Perhaps you have a string with multiple backslash escape sequences (or double backslashes). If so, you can just repeat steps 2 and 3 as they are listed above as many times as necessary.

your_string = "\\\\xda\\\\xad\\\\x94" your_string.encode().decode('unicode_escape').encode('raw_unicode_escape').decode('unicode_escape').encode('raw_unicode_escape') 

As you can see this is getting quite tedious and messy, but you can always create a function to counter that.

Without Backslash Escape Sequences

Now if you have a string without any backslash escape sequences that you want to turn into a bytes object, all that is needed is the encoding seen in step 1:

your_string = "\xda\xad\x94" your_string.encode() 

Bytes Objects

If you have a bytes object instead of a string, everything is basically the same, just skip step 1, because bytes objects are already encoded.

your_bytes_obj = b"\\xda\\xad\\x94" your_string.decode('unicode_escape').encode("raw_unicode_escape") 

All of these examples should grant you a bytes object without escaped backslashes, which in the examples I have provided above is:

b'\xda\xad\x94' 

Explanation

The unicode_escape codec removes escapes when decoding (and alternatively adds escapes when encoding), and the raw_unicode_escape codec does not escape backslashes when encoding. So both of these codecs come in handy when handling escape characters in bytes objects.

raw_unicode_escape

Latin-1 encoding with \uXXXX and \UXXXXXXXX for other code points. Existing backslashes are not escaped in any way. It is used in the Python pickle protocol.

unicode_escape

Encoding suitable as the contents of a Unicode literal in ASCII-encoded Python source code, except that quotes are not escaped. Decode from Latin-1 source code. Beware that Python source code actually uses UTF-8 by default.

I would add that the str.encode() method isn't the only means of encoding a string. Alternatively, you can use the encode function from the codecs module, or even pass your string directly into the built-in bytes(str, encoding) object (just make sure to provide for the correct argument for the encoding parameter).
The reason why I used the str.encode() method here is because it seemed more straightforward.

For more information see:
Python 2 Library - Python Specific Encodings
Python 3 Library - Text Encodings
Python 3 Lexical Analysis - String & Bytes Literals and Escape Sequences

Sign up to request clarification or add additional context in comments.

Comments

4

In Python 3.6 having you can use

data_bytes, _ = codecs.escape_decode(data, 'hex')` part of `import codecs 

library. In your case data is msg variable.

If you print the value of data_bytes you will get your values in bytes

Comments

0

You can't do that because '\\' represent a slash, not a double slash. For example, if you will convert the msg to a string and use the print function to print the msg, you will see only one slash.

Comments

-4

I wanted to place this as a comment to Adrian Gherasims answer, but it got too long so I put it as a separate "answer".

For normal symbols you can use the replace-function

In [1]: temp = 'aa1aa2aa3aa4aa5' In [2]: temp Out[2]: 'aa1aa2aa3aa4aa5' In [3]: temp.replace('aa', 'a') Out[3]: 'a1a2a3a4a5' 

However if you try to do the same with your double slash it gives a syntax error

In [4]: temp2 = '\\1\\2\\3\\4' In [5]: temp2 Out[5]: '\\1\\2\\3\\4' In [6]: temp2.replace('\\', '\') File "<ipython-input-6-3973ee057a3e>", line 1 temp2.replace('\\', '\') ^ SyntaxError: EOL while scanning string literal 

1 Comment

Ending this answer with how to overcome the syntax error would be much more useful.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.