Faster binary Hamming weight for big integers?

Question

While working on an answer to Count the sequences in an array I found that DigitCount was the bottleneck in my code when used as DigitCount[num, 2, 1]. DigitCount first expands the number to an explicit list of digits and then Tallys them. This is of course quite inefficient.

A minor improvement can be had by simply summing the IntegerDigits but that still wastefully expands the size of the expression more than sixty times.

Is there a faster method to perform this operation on big integers?

A less performance oriented related question: Sum over binary digits of integer

Examples:

num = RandomInteger[10^(3*^6)]; list = IntegerDigits[num, 2]; ByteCount /@ {num, list}

{1245800, 79726416}

DigitCount[num, 2, 1] // RepeatedTiming Tr @ IntegerDigits[num, 2] // RepeatedTiming

{0.0417, 4982222} {0.028, 4982222}

An unrelated bit-level operation is two orders of magnitude faster:

BitShiftLeft[num]; // RepeatedTiming

{0.000264, Null}

Some possibly useful methods here: graphics.stanford.edu/~seander/bithacks.html -- I haven't tried to implement any of them yet. — Mr.Wizard
– Mr.Wizard, Commented Jul 3, 2015 at 22:41
@Oleksandr Thanks for your work on this. BitShiftLeft is probably an unfair comparison. Should I remove that from this question? — Mr.Wizard
– Mr.Wizard, Commented Jul 3, 2015 at 22:47
No; I take my comment back. It is the large number of times that BitAnd is called, not BitAnd itself, that leads to slowness. BitAnd by itself works as fast as BitShiftLeft. This will probably be a hazard for most of the bit-manipulation methods you reference. — Oleksandr R.
– Oleksandr R., Commented Jul 3, 2015 at 22:51
This is one of those cases where if speed were critical, I'd go ahead an do a call-out to C/Lisp/etc. where they have built-in population count for bits... +1 on interesting question, I look forward to responses. — ciao
– ciao, Commented Jul 3, 2015 at 23:03
Saving three characters when typing is always important, right? Or at least an offering to the gods of tersity (sic), concisity (sic), and laconicity (sic) :-) — m_goldberg
– m_goldberg, Commented Jul 4, 2015 at 1:31

Community · Accepted Answer · 2017-04-13 12:55:39Z

We can take advantage of the fact that IntegerDigits is very fast when the base is large. But not too large: no bigger than $2^{63}-1$ on a 64-bit system or $2^{31}-1$ on a 32-bit one, because Mathematica's machine integers are signed. Additionally, non-power-of-two bases require more work to get the result than just partitioning a bit-string, and are correspondingly slower. So, we choose the greatest allowable power of two, i.e. $2^{62}$. (Here we assume a 64-bit-capable computer.)

We also take advantage of the POPCNT x86 instruction and its implementation as a compiler builtin. A simplified version of this answer provides the necessary LibraryLink function:

#include "WolframLibrary.h" DLLEXPORT mint WolframLibrary_getVersion() { return WolframLibraryVersion; } DLLEXPORT int WolframLibrary_initialize(WolframLibraryData libData) { return 0; } DLLEXPORT void WolframLibrary_uninitialize() { return; } DLLEXPORT int hammingWeight_T_I(WolframLibraryData libData, mint argc, MArgument *args, MArgument res) { MTensor in; const mint *dims; mint *indata, i, total; in = MArgument_getMTensor(args[0]); if (libData->MTensor_getRank(in) != 1) return LIBRARY_DIMENSION_ERROR; if (libData->MTensor_getType(in) != MType_Integer) return LIBRARY_TYPE_ERROR; dims = libData->MTensor_getDimensions(in); indata = libData->MTensor_getIntegerData(in); total = 0; #pragma omp parallel for schedule(static) reduction(+:total) for (i = 0; i < dims[0]; i++) { total += (mint)__builtin_popcountll( (unsigned long long)indata[i] ); } MArgument_setInteger(res, total); return LIBRARY_NO_ERROR; }

This function takes a list of integers and produces the total of their Hamming weights, using OpenMP for parallelization. Here we use __builtin_popcountll, which is a GCC builtin, but other compilers have their own equivalents, such as __popcnt64 for Microsoft C++. If you use a compiler other than GCC, you can substitute the appropriate function.

Compile it:

gcc -Wall -fopenmp -O3 -march=native -I. -shared -o hammingWeight.dll hammingWeight.c

(Here you should write the correct include path for WolframLibrary.h.)

Now we can define our function:

hammingWeightC = LibraryFunctionLoad[ "hammingWeight.dll", "hammingWeight_T_I", {{Integer, 1, "Constant"}}, {Integer, 0, Automatic} ]; hammingWeight[num_Integer] := hammingWeightC@IntegerDigits[num, 2^62];

Let's create an obnoxiously large integer to test it with:

num = RandomInteger[10^(5*^7)]; hammingWeight[num] === Tr@IntegerDigits[num, 2] (* -> True *)

So, it works. How does it do for speed?

AbsoluteTiming[ Do[Tr@IntegerDigits[num, 2], {10}] ] (* -> 11.594 seconds *) AbsoluteTiming[ Do[hammingWeight[num], {10}] ] (* -> 0.297 seconds *)

As we see, on my computer, it is about 40 times faster than the next best approach. 85% of the runtime is accounted for by IntegerDigits rather than the calculation of the Hamming weight itself, so probably it will be more or less the same on other computers as well.

N.B.: This same IntegerDigits method can also be adapted to the linked question, thus providing the solution for how to quickly calculate the Hamming distance of big integers, given the already elaborated method for machine integers.

I guess I should add this note: this can be used to compute the Thue-Morse and Rudin-Shapiro sequences that are built-in in newer versions, but are apparently not too fast. — J. M.'s missing motivation
– J. M.'s missing motivation, Commented Jul 28, 2015 at 12:40
Unfortunately, I don't have a computer right now. But Jacob and me discussed this in chat; anyway, if you want to run your own tests, here are the relevant identities: ThueMorse[n] == Mod[hammingWeight[n], 2] and RudinShapiro[n] == 1 - 2 ThueMorse[BitAnd[n, Quotient[n, 2]]]. — J. M.'s missing motivation
– J. M.'s missing motivation, Commented Jul 28, 2015 at 14:27
Why the code cannot be compiled normally?Needs["CCompilerDriver`"] myLibrary = CreateLibrary[{"C:\\Users\\Shutao TANG\\Desktop\\hammingWeight.c"}, "hammingWeight", "Debug" -> False], which gives me some errors — xyz
– xyz, Commented Jun 2, 2016 at 4:34
@ShutaoTANG sorry, I can't read Chinese represented in an incorrect code page, so I can't tell what the errors say. However, it seems that your problem is using Microsoft C++ to compile code that uses GCC builtins (because I used GCC to test this). If you add a reference to intrin.h and change __builtin_popcountll to __popcnt64, as I stated in the answer, it should hopefully work. But I haven't tested that myself. — Oleksandr R.
– Oleksandr R., Commented Jun 2, 2016 at 12:25
@ShutaoTANG yes, clearly the characters are not correctly written. But I think they come from the representation of Chinese characters as a sequence of bytes, interpreted in the incorrect code page? For example, if I go to baidu.com and then switch my browser code page to Windows-1252, I get very similar-looking garbled characters. — Oleksandr R.
– Oleksandr R., Commented Jun 2, 2016 at 13:34

Stack Exchange Network

Faster binary Hamming weight for big integers?

1 Answer 1

Linked

Hot Network Questions

Faster binary Hamming weight for big integers?

1 Answer 1

Linked

Related

Hot Network Questions