2

I am trying to remove the hexadecimal characters \xef\xbb\xbf from my string however I am getting the following error.

Not quite sure how to resolve this.

>>> x = u'\xef\xbb\xbfHello' >>> x u'\xef\xbb\xbfHello' >>> type(x) <type 'unicode'> >>> print x Hello >>> print x.replace('\xef\xbb\xbf', '') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) >>> 
0

3 Answers 3

3

You need to replace the unicode object, otherwise Python2 will to attempt to encode x with the ascii codec to search for the a str in it.

>>> x = u'\xef\xbb\xbfHello' >>> x u'\xef\xbb\xbfHello' >>> print(x.replace(u'\xef\xbb\xbf',u'')) Hello 

This only holds for Python2. In Python3 both versions will work.

Sign up to request clarification or add additional context in comments.

Comments

0

Try to use either the decode or unicode functions, like so:

x.decode('utf-8') 

or

unicode(string, 'utf-8') 

Source: UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

Comments

0

The real problem was that your Unicode string was incorrectly decoded in the first place. Those characters are a UTF-8 byte order mark (BOM) character mis-decoded as (likely) latin-1 or cp1252.

Ideally, fix how they were decoded, but you can reverse the error by re-encoding as latin1 and decoding correctly:

>>> x = u'\xef\xbb\xbfHello' >>> x.encode('latin1').decode('utf8') # decode correctly, U+FEFF is a BOM. u'\ufeffHello' >>> x.encode('latin1').decode('utf-8-sig') # decode and handle BOM. u'Hello' 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.