1

I'm using python3.3. I've been trying to decode a certain string that looks like this:

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xed:\xf9w\xdaH\xd2?\xcf\xbc.... 

keeps going on. However whenever I try to decode this string using str.decode('utf-16') I get an error saying:

'utf16' codec can't decode bytes in position 54-55: illegal UTF-16 surrogate 

I'm not exactly sure how to decode this string.

6
  • 1
    So that means it isn't really UTF16. Where did you get the string? Might it be UCS2? Commented Apr 27, 2016 at 18:11
  • Does the result look ok, if you only decode up to position 53? This may help to decide whether your assumption utf16 is correct. Commented Apr 27, 2016 at 18:16
  • I got it from Twisted, I went in twisted/web/proxy.py in the handleResponsePart(self, buffer) function, I just injected print(buffer). So basically the encoded string you're looking at is supposed to be HTML, that I receive from Twisted proxies Commented Apr 27, 2016 at 18:17
  • so the actual string is huge, what I pasted is only a small part of the full string. Commented Apr 27, 2016 at 18:20
  • Try explicitly decoding to 'UTF-16BE' and 'UTF-16LE' — endianess might be the issue. Commented Apr 27, 2016 at 18:35

1 Answer 1

4

gzipped data begins with \x1f\x8b\x08 so my guess is that your data is gzipped. Try gunzipping the data before decoding.

import io import gzip # this raises IOError because `buf` is incomplete. It may work if you supply the complete buf buf = b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xed:\xf9w\xdaH\xd2?\xcf\xbc' with gzip.GzipFile(fileobj=io.BytesIO(buf)) as f: content = f.read() print(content.decode('utf-16')) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.