We can take advantage of the fact that IntegerDigits is very fast when the base is large. But not too large: no bigger than $2^{63}-1$ on a 64-bit system or $2^{31}-1$ on a 32-bit one, because Mathematica's machine integers are signed. Additionally, non-power-of-two bases require more work to get the result than just partitioning a bit-string, and are correspondingly slower. So, we choose the greatest allowable power of two, i.e. $2^{62}$. (Here we assume a 64-bit-capable computer.)
We also take advantage of the POPCNT x86 instruction and its implementation as a compiler builtin. A simplified version of this answer provides the necessary LibraryLink function:
#include "WolframLibrary.h" DLLEXPORT mint WolframLibrary_getVersion() { return WolframLibraryVersion; } DLLEXPORT int WolframLibrary_initialize(WolframLibraryData libData) { return 0; } DLLEXPORT void WolframLibrary_uninitialize() { return; } DLLEXPORT int hammingWeight_T_I(WolframLibraryData libData, mint argc, MArgument *args, MArgument res) { MTensor in; const mint *dims; mint *indata, i, total; in = MArgument_getMTensor(args[0]); if (libData->MTensor_getRank(in) != 1) return LIBRARY_DIMENSION_ERROR; if (libData->MTensor_getType(in) != MType_Integer) return LIBRARY_TYPE_ERROR; dims = libData->MTensor_getDimensions(in); indata = libData->MTensor_getIntegerData(in); total = 0; #pragma omp parallel for schedule(static) reduction(+:total) for (i = 0; i < dims[0]; i++) { total += (mint)__builtin_popcountll( (unsigned long long)indata[i] ); } MArgument_setInteger(res, total); return LIBRARY_NO_ERROR; }
This function takes a list of integers and produces the total of their Hamming weights, using OpenMP for parallelization. Here we use __builtin_popcountll, which is a GCC builtin, but other compilers have their own equivalents, such as __popcnt64 for Microsoft C++. If you use a compiler other than GCC, you can substitute the appropriate function.
Compile it:
gcc -Wall -fopenmp -O3 -march=native -I. -shared -o hammingWeight.dll hammingWeight.c
(Here you should write the correct include path for WolframLibrary.h.)
Now we can define our function:
hammingWeightC = LibraryFunctionLoad[ "hammingWeight.dll", "hammingWeight_T_I", {{Integer, 1, "Constant"}}, {Integer, 0, Automatic} ]; hammingWeight[num_Integer] := hammingWeightC@IntegerDigits[num, 2^62];
Let's create an obnoxiously large integer to test it with:
num = RandomInteger[10^(5*^7)]; hammingWeight[num] === Tr@IntegerDigits[num, 2] (* -> True *)
So, it works. How does it do for speed?
AbsoluteTiming[ Do[Tr@IntegerDigits[num, 2], {10}] ] (* -> 11.594 seconds *) AbsoluteTiming[ Do[hammingWeight[num], {10}] ] (* -> 0.297 seconds *)
As we see, on my computer, it is about 40 times faster than the next best approach. 85% of the runtime is accounted for by IntegerDigits rather than the calculation of the Hamming weight itself, so probably it will be more or less the same on other computers as well.
N.B.: This same IntegerDigits method can also be adapted to the linked question, thus providing the solution for how to quickly calculate the Hamming distance of big integers, given the already elaborated method for machine integers.
BitShiftLeftis probably an unfair comparison. Should I remove that from this question? $\endgroup$BitAndis called, notBitAnditself, that leads to slowness.BitAndby itself works as fast asBitShiftLeft. This will probably be a hazard for most of the bit-manipulation methods you reference. $\endgroup$