8

I have this Python code to do this:

from struct import pack as _pack def packl(lnum, pad = 1): if lnum < 0: raise RangeError("Cannot use packl to convert a negative integer " "to a string.") count = 0 l = [] while lnum > 0: l.append(lnum & 0xffffffffffffffffL) count += 1 lnum >>= 64 if count <= 0: return '\0' * pad elif pad >= 8: lens = 8 * count % pad pad = ((lens != 0) and (pad - lens)) or 0 l.append('>' + 'x' * pad + 'Q' * count) l.reverse() return _pack(*l) else: l.append('>' + 'Q' * count) l.reverse() s = _pack(*l).lstrip('\0') lens = len(s) if (lens % pad) != 0: return '\0' * (pad - lens % pad) + s else: return s 

This takes approximately 174 usec to convert 2**9700 - 1 to a string of bytes on my machine. If I'm willing to use the Python 2.7 and Python 3.x specific bit_length method, I can shorten that to 159 usecs by pre-allocating the l array to be the exact right size at the very beginning and using l[something] = syntax instead of l.append.

Is there anything I can do that will make this faster? This will be used to convert large prime numbers used in cryptography as well as some (but not many) smaller numbers.

Edit

This is currently the fastest option in Python < 3.2, it takes about half the time either direction as the accepted answer:

def packl(lnum, padmultiple=1): """Packs the lnum (which must be convertable to a long) into a byte string 0 padded to a multiple of padmultiple bytes in size. 0 means no padding whatsoever, so that packing 0 result in an empty string. The resulting byte string is the big-endian two's complement representation of the passed in long.""" if lnum == 0: return b'\0' * padmultiple elif lnum < 0: raise ValueError("Can only convert non-negative numbers.") s = hex(lnum)[2:] s = s.rstrip('L') if len(s) & 1: s = '0' + s s = binascii.unhexlify(s) if (padmultiple != 1) and (padmultiple != 0): filled_so_far = len(s) % padmultiple if filled_so_far != 0: s = b'\0' * (padmultiple - filled_so_far) + s return s def unpackl(bytestr): """Treats a byte string as a sequence of base 256 digits representing an unsigned integer in big-endian format and converts that representation into a Python integer.""" return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0 

In Python 3.2 the int class has to_bytes and from_bytes functions that can accomplish this much more quickly that the method given above.

9
  • 2
    What does pad do? A docstring would be handy to understand the usage. Commented Dec 5, 2010 at 10:33
  • 1
    @Scott As far as I can tell, the output is zero-padded at the front to the next multiple-of-pad number of bytes. Commented Dec 5, 2010 at 10:42
  • Reagrdelss of eing a localy used variable, you shoul avoid using a variable name sucha s "l" - it looks too much like "1" on most fonts to keep readability. Commented Dec 5, 2010 at 13:12
  • @Karl Knechtel - That's precisely right. I wanted to use this in situations where I wanted to dump it into a slot that was exactly 64 bits long, or 128 bits long or something like that. Commented Dec 5, 2010 at 16:56
  • @Scott Griffiths - You're right, it does need a docstring. embarrassed look Commented Dec 5, 2010 at 16:57

4 Answers 4

10

Here is a solution calling the Python/C API via ctypes. Currently, it uses NumPy, but if NumPy is not an option, it could be done purely with ctypes.

import numpy import ctypes PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray PyLong_AsByteArray.argtypes = [ctypes.py_object, numpy.ctypeslib.ndpointer(numpy.uint8), ctypes.c_size_t, ctypes.c_int, ctypes.c_int] def packl_ctypes_numpy(lnum): a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8) PyLong_AsByteArray(lnum, a, a.size, 0, 1) return a 

On my machine, this is 15 times faster than your approach.

Edit: Here is the same code using ctypes only and returning a string instead of a NumPy array:

import ctypes PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray PyLong_AsByteArray.argtypes = [ctypes.py_object, ctypes.c_char_p, ctypes.c_size_t, ctypes.c_int, ctypes.c_int] def packl_ctypes(lnum): a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1) PyLong_AsByteArray(lnum, a, len(a), 0, 1) return a.raw 

This is another two times faster, totalling to a speed-up factor of 30 on my machine.

Sign up to request clarification or add additional context in comments.

17 Comments

Won't that use the system native endianness though?
@Karl: No, it won't. The fourth parameter to PyLong_AsByteArray() indicates which endianness to use: 0 means big endian, anything else means little endian.
Awesome. Now I wish this were exposed directly... :/
Has this API changed much over different versions of Python?
int(binascii.hexlify(stringbytes), 16) is faster than ctypes.pythonapi._PyLong_FromByteArray. Whoulda thunk it?
|
5

For completeness and for future readers of this question:

Starting in Python 3.2, there are functions int.from_bytes() and int.to_bytes() that perform the conversion between bytes and int objects in a choice of byte orders.

3 Comments

Thanks! I'm wondering though if the endian flags will slow it down or not. We'll see.
Even with the endian flag, it still takes 1/3rd the time (or less) of the fastest method I'd found so far.
int.to_bytes() requires a length parameter: n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0' (source)
3

I suppose you really should just be using numpy, which I'm sure has something or other built in for this. It might also be faster to hack around with the array module. But I'll take a stab at it anyway.

IMX, creating a generator and using a list comprehension and/or built-in summation is faster than a loop that appends to a list, because the appending can be done internally. Oh, and 'lstrip' on a large string has got to be costly.

Also, some style points: special cases aren't special enough; and you appear not to have gotten the memo about the new x if y else z construct. :) Although we don't need it anyway. ;)

from struct import pack as _pack Q_size = 64 Q_bitmask = (1L << Q_size) - 1L def quads_gen(a_long): while a_long: yield a_long & Q_bitmask a_long >>= Q_size def pack_long_big_endian(a_long, pad = 1): if lnum < 0: raise RangeError("Cannot use packl to convert a negative integer " "to a string.") qs = list(reversed(quads_gen(a_long))) # Pack the first one separately so we can lstrip nicely. first = _pack('>Q', qs[0]).lstrip('\x00') rest = _pack('>%sQ' % len(qs) - 1, *qs[1:]) count = len(first) + len(rest) # A little math trick that depends on Python's behaviour of modulus # for negative numbers - but it's well-defined and documented return '\x00' * (-count % pad) + first + rest 

1 Comment

I shouldn't have voted you up. Your code has numerous errors.
3

Just wanted to post a follow-up to Sven's answer (which works great). The opposite operation - going from arbitrarily long bytes object to Python Integer object requires the following (because there is no PyLong_FromByteArray() C API function that I can find):

import binascii def unpack_bytes(stringbytes): #binascii.hexlify will be obsolete in python3 soon #They will add a .tohex() method to bytes class #Issue 3532 bugs.python.org return int(binascii.hexlify(stringbytes), 16) 

4 Comments

There is in fact a _PyLong_FromByteArray function (at least in Python 2.7 and Python 3). I'm using it. But your method would probably be pretty fast too.
This is, in fact, faster than calling _PyLong_FromByteArray using ctypes. How bizarre. Even better, I don't have to check if the input is a memoryview because hexlify handles those, and I don't have to convert to an int in Python 2.7 in order to make the value into a straight int if it's small enough to not need to be a long.
Also, using hex(lnum)' and binascii.unhexlify` (with a little extra glue) is also faster than the ctypes option.
Weird. I looked through the Python 3.1.x C API reference and couldn't find PyLong_FromByteArray().

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.