Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes?

Question

I have this Python code to do this:

from struct import pack as _pack def packl(lnum, pad = 1): if lnum < 0: raise RangeError("Cannot use packl to convert a negative integer " "to a string.") count = 0 l = [] while lnum > 0: l.append(lnum & 0xffffffffffffffffL) count += 1 lnum >>= 64 if count <= 0: return '\0' * pad elif pad >= 8: lens = 8 * count % pad pad = ((lens != 0) and (pad - lens)) or 0 l.append('>' + 'x' * pad + 'Q' * count) l.reverse() return _pack(*l) else: l.append('>' + 'Q' * count) l.reverse() s = _pack(*l).lstrip('\0') lens = len(s) if (lens % pad) != 0: return '\0' * (pad - lens % pad) + s else: return s

This takes approximately 174 usec to convert 2**9700 - 1 to a string of bytes on my machine. If I'm willing to use the Python 2.7 and Python 3.x specific bit_length method, I can shorten that to 159 usecs by pre-allocating the l array to be the exact right size at the very beginning and using l[something] = syntax instead of l.append.

Is there anything I can do that will make this faster? This will be used to convert large prime numbers used in cryptography as well as some (but not many) smaller numbers.

Edit

This is currently the fastest option in Python < 3.2, it takes about half the time either direction as the accepted answer:

def packl(lnum, padmultiple=1): """Packs the lnum (which must be convertable to a long) into a byte string 0 padded to a multiple of padmultiple bytes in size. 0 means no padding whatsoever, so that packing 0 result in an empty string. The resulting byte string is the big-endian two's complement representation of the passed in long.""" if lnum == 0: return b'\0' * padmultiple elif lnum < 0: raise ValueError("Can only convert non-negative numbers.") s = hex(lnum)[2:] s = s.rstrip('L') if len(s) & 1: s = '0' + s s = binascii.unhexlify(s) if (padmultiple != 1) and (padmultiple != 0): filled_so_far = len(s) % padmultiple if filled_so_far != 0: s = b'\0' * (padmultiple - filled_so_far) + s return s def unpackl(bytestr): """Treats a byte string as a sequence of base 256 digits representing an unsigned integer in big-endian format and converts that representation into a Python integer.""" return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0

In Python 3.2 the int class has to_bytes and from_bytes functions that can accomplish this much more quickly that the method given above.

What does pad do? A docstring would be handy to understand the usage. — Scott Griffiths
– Scott Griffiths, Commented Dec 5, 2010 at 10:33
@Scott As far as I can tell, the output is zero-padded at the front to the next multiple-of-pad number of bytes. — Karl Knechtel
– Karl Knechtel, Commented Dec 5, 2010 at 10:42
Reagrdelss of eing a localy used variable, you shoul avoid using a variable name sucha s "l" - it looks too much like "1" on most fonts to keep readability. — jsbueno
– jsbueno, Commented Dec 5, 2010 at 13:12
@Karl Knechtel - That's precisely right. I wanted to use this in situations where I wanted to dump it into a slot that was exactly 64 bits long, or 128 bits long or something like that. — Omnifarious
– Omnifarious, Commented Dec 5, 2010 at 16:56
@Scott Griffiths - You're right, it does need a docstring. embarrassed look — Omnifarious
– Omnifarious, Commented Dec 5, 2010 at 16:57

Sven Marnach · Accepted Answer · 2010-12-05 12:17:34Z

Here is a solution calling the Python/C API via ctypes. Currently, it uses NumPy, but if NumPy is not an option, it could be done purely with ctypes.

import numpy import ctypes PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray PyLong_AsByteArray.argtypes = [ctypes.py_object, numpy.ctypeslib.ndpointer(numpy.uint8), ctypes.c_size_t, ctypes.c_int, ctypes.c_int] def packl_ctypes_numpy(lnum): a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8) PyLong_AsByteArray(lnum, a, a.size, 0, 1) return a

On my machine, this is 15 times faster than your approach.

Edit: Here is the same code using ctypes only and returning a string instead of a NumPy array:

import ctypes PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray PyLong_AsByteArray.argtypes = [ctypes.py_object, ctypes.c_char_p, ctypes.c_size_t, ctypes.c_int, ctypes.c_int] def packl_ctypes(lnum): a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1) PyLong_AsByteArray(lnum, a, len(a), 0, 1) return a.raw

This is another two times faster, totalling to a speed-up factor of 30 on my machine.

@Karl: No, it won't. The fourth parameter to PyLong_AsByteArray() indicates which endianness to use: 0 means big endian, anything else means little endian.
Has this API changed much over different versions of Python?
int(binascii.hexlify(stringbytes), 16) is faster than ctypes.pythonapi._PyLong_FromByteArray. Whoulda thunk it?

martineau · Accepted Answer · 2016-09-02 14:08:20Z

5

For completeness and for future readers of this question:

Starting in Python 3.2, there are functions int.from_bytes() and int.to_bytes() that perform the conversion between bytes and int objects in a choice of byte orders.

edited Sep 2, 2016 at 14:08

martineau

124k29 gold badges181 silver badges319 bronze badges

answered Jun 21, 2011 at 23:15

Sven Marnach

608k123 gold badges966 silver badges865 bronze badges

3 Comments

Omnifarious Over a year ago

Thanks! I'm wondering though if the endian flags will slow it down or not. We'll see.

Omnifarious Over a year ago

Even with the endian flag, it still takes 1/3rd the time (or less) of the fastest method I'd found so far.

milahu Over a year ago

int.to_bytes() requires a length parameter: n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0' (source)

Karl Knechtel · Accepted Answer · 2010-12-05 10:49:02Z

I suppose you really should just be using numpy, which I'm sure has something or other built in for this. It might also be faster to hack around with the array module. But I'll take a stab at it anyway.

IMX, creating a generator and using a list comprehension and/or built-in summation is faster than a loop that appends to a list, because the appending can be done internally. Oh, and 'lstrip' on a large string has got to be costly.

Also, some style points: special cases aren't special enough; and you appear not to have gotten the memo about the new x if y else z construct. :) Although we don't need it anyway. ;)

from struct import pack as _pack Q_size = 64 Q_bitmask = (1L << Q_size) - 1L def quads_gen(a_long): while a_long: yield a_long & Q_bitmask a_long >>= Q_size def pack_long_big_endian(a_long, pad = 1): if lnum < 0: raise RangeError("Cannot use packl to convert a negative integer " "to a string.") qs = list(reversed(quads_gen(a_long))) # Pack the first one separately so we can lstrip nicely. first = _pack('>Q', qs[0]).lstrip('\x00') rest = _pack('>%sQ' % len(qs) - 1, *qs[1:]) count = len(first) + len(rest) # A little math trick that depends on Python's behaviour of modulus # for negative numbers - but it's well-defined and documented return '\x00' * (-count % pad) + first + rest

I shouldn't have voted you up. Your code has numerous errors.

bk0 · Accepted Answer · 2011-02-11 15:55:51Z

3

Just wanted to post a follow-up to Sven's answer (which works great). The opposite operation - going from arbitrarily long bytes object to Python Integer object requires the following (because there is no PyLong_FromByteArray() C API function that I can find):

import binascii def unpack_bytes(stringbytes): #binascii.hexlify will be obsolete in python3 soon #They will add a .tohex() method to bytes class #Issue 3532 bugs.python.org return int(binascii.hexlify(stringbytes), 16)

answered Feb 11, 2011 at 15:55

bk0

1,4001 gold badge10 silver badges12 bronze badges

4 Comments

Omnifarious Over a year ago

There is in fact a _PyLong_FromByteArray function (at least in Python 2.7 and Python 3). I'm using it. But your method would probably be pretty fast too.

Omnifarious Over a year ago

This is, in fact, faster than calling _PyLong_FromByteArray using ctypes. How bizarre. Even better, I don't have to check if the input is a memoryview because hexlify handles those, and I don't have to convert to an int in Python 2.7 in order to make the value into a straight int if it's small enough to not need to be a long.

Omnifarious Over a year ago

Also, using hex(lnum)' and binascii.unhexlify` (with a little extra glue) is also faster than the ctypes option.

bk0 Over a year ago

Weird. I looked through the Python 3.1.x C API reference and couldn't find PyLong_FromByteArray().

Collectives™ on Stack Overflow

Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes?

4 Answers 4

17 Comments

3 Comments

1 Comment

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

17 Comments

3 Comments

1 Comment

4 Comments

Linked

Related