bz2 decompress with Python 3.4 - TypeError: 'str' does not support the buffer interface

Question

There are similar errors but I could not find a solution for bz2.

The following program fails on the decompress:

import bz2 un = 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' pw = 'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08' decoded_un = bz2.decompress(un) decoded_pw = bz2.decompress(pw) print(decoded_un) print(decoded_pw)

I tried using bytes(un, 'UTF-8) but that would not work. I think I did not have this problem in Python 3.3.

EDIT: this was for the Python challenge I have two bits of code which work thanks to Martijn:

import bz2 un_saved = 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' pw_saved = 'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08' print(bz2.decompress(un_saved.encode('latin1'))) print(bz2.decompress(pw_saved.encode('latin1')))

This one works from the webpage:

# http://www.pythonchallenge.com/pc/def/integrity.html import urllib.request import re import os.path import bz2 fname = "008.html" if not os.path.isfile(fname): url = 'http://www.pythonchallenge.com/pc/def/integrity.html' response = urllib.request.urlopen(url) webpage = response.read().decode("utf-8") with open(fname, "w") as fh: fh.write(webpage) with open(fname, "r") as fh: webpage = fh.read() re_un = '\\nun: \'(.*)\'\\n' m = re.search(re_un, webpage) un = m.group(1) print(un) pw_un = '\\npw: \'(.*)\'\\n' m = re.search(pw_un, webpage) pw = m.group(1) print(pw) unde = un.encode('latin-1').decode('unicode_escape').encode('latin1') pwde = pw.encode('latin-1').decode('unicode_escape').encode('latin1') decoded_un = bz2.decompress(unde) decoded_pw = bz2.decompress(pwde) print(decoded_un) print(decoded_pw)

Martijn Pieters · Accepted Answer · 2014-12-24 21:48:23Z

The bz2 library deals with bytes objects, not strings:

un = b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' pw = b'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'

In other words, using bytes() works just fine, just make sure you use the correct encoding. UTF-8 is not that encoding; if you have bytes masking as string character codepoints, use Latin-1 to encode instead; Latin 1 maps characters one-on-one to bytes:

un = un.encode('latin1')

or

un = bytes(un, 'latin1')

Also see the Python Unicode HOWTO:

Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code points 0–255 are identical to the Latin-1 values, so converting to this encoding simply requires converting code points to byte values; if a code point larger than 255 is encountered, the string can’t be encoded into Latin-1.

I'll leave the decoding to you. Have fun with the Python Challenge!

Note that if you loaded these characters as they are from a webpage, they will not by ready-made bytes! You'll have the characters '\', 'x', 8 and 2 rather than a codepoint with hex value 82. You'd need to interpret those sequences as a Python string literal first:

>>> un = r'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084' >>> un 'BZh91AY&SYA\\xaf\\x82\\r\\x00\\x00\\x01\\x01\\x80\\x02\\xc0\\x02\\x00 \\x00!\\x9ah3M\\x07<]\\xc9\\x14\\xe1BA\\x06\\xbe\\x084' >>> un.encode('latin-1').decode('unicode_escape') 'BZh91AY&SYA¯\x82\r\x00\x00\x01\x01\x80\x02À\x02\x00 \x00!\x9ah3M\x07<]É\x14áBA\x06¾\x084' >>> un.encode('latin-1').decode('unicode_escape').encode('latin1') b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'

Note the double backslashes in the representation of un. Only the last bytes result is then decompressable!

Do you have a link I could use to understand the different encoding ? I am really struggling with this ... This was for the Python Challenge ...
@BlueTrin: yeah, I recognised the strings. Not sure if I have a link for you, but take into account that the challenge was written for Python 2, before Python 3 was mainstream.
joelonsoftware.com/articles/Unicode.html farmdev.com/talks/unicode
@IgnacioVazquez-Abrams: less helpful in this context, where the bytes are taken from a webpage but whatever extraction method used by the OP gave them a string rather than bytes..
@BlueTrin: I suspect there was something else wrong here, I updated the answer. In future, can you make sure you always include the full error you got with the code you tried?

Collectives™ on Stack Overflow

bz2 decompress with Python 3.4 - TypeError: 'str' does not support the buffer interface

1 Answer 1

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Related