I have recently taken up the activity of parsing binary data with Python but am confused by the way "byte" items are treated by Python. Take for e.g. the following interpreter conversation:
>>> f = open('somefile.gz', 'rb') >>> f <open file 'textfile.gz', mode 'rb' at 0xb77f4d88> >>> bytes = f.read() >>> bytes[0] '\x1f' >>> len(bytes[0]) 1 >>> int(bytes[0]) <---- calling __str__ automatically on bytes[0] ? Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '\x1f' The above session shows that bytes[0] has the size of 1 byte but the __str__ representation is a hexadecimal one. No worries, but when I try to treat bytes[0] as a single byte, I get funky behaviour.
If I want to parse/interpret a binary stream based on some specification where the specification includes representation in hexadecimal, binary and decimal, how would I go about doing that.
An e.g. would be "first two bytes are \xbeef, the next is a decimal 8 followed by a packed bit field where each of the 8 bits of the byte represent some flag? I guess there are a few modules out there which make this task easy but I'd want to do it from scratch.
I have seen references to struct module but is there no way of checking the bytes read directly without introducing a new module? Something like bytes[0] == 0xbeef ?
Can someone please help me out with how normally folks parse binary data conforming a specification using Python? Thanks.