8 bits representing the number 7 look like this:
00000111 Three bits are set.
What are the algorithms to determine the number of set bits in a 32-bit integer?
8 bits representing the number 7 look like this:
00000111 Three bits are set.
What are the algorithms to determine the number of set bits in a 32-bit integer?
Here is a portable module ( ANSI-C ) which can benchmark each of your algorithms on any architecture.
Your CPU has 9 bit bytes? No problem :-) At the moment it implements 2 algorithms, the K&R algorithm and a byte wise lookup table. The lookup table is on average 3 times faster than the K&R algorithm. If someone can figure a way to make the "Hacker's Delight" algorithm portable feel free to add it in.
#ifndef _BITCOUNT_H_ #define _BITCOUNT_H_ /* Return the Hamming Weight of val, i.e. the number of 'on' bits. */ int bitcount( unsigned int ); /* List of available bitcount algorithms. * onTheFly: Calculate the bitcount on demand. * * lookupTable: Uses a small lookup table to determine the bitcount. This * method is on average 3 times as fast as onTheFly, but incurs a small * upfront cost to initialize the lookup table on the first call. * * strategyCount is just a placeholder. */ enum strategy { onTheFly, lookupTable, strategyCount }; /* String represenations of the algorithm names */ extern const char *strategyNames[]; /* Choose which bitcount algorithm to use. */ void setStrategy( enum strategy ); #endif .
#include <limits.h> #include "bitcount.h" /* The number of entries needed in the table is equal to the number of * unique values a char can represent which is always UCHAR_MAX + 1 */ static unsigned char _bitCountTable[UCHAR_MAX + 1]; static unsigned int _lookupTableInitialized = 0; static int _defaultBitCount( unsigned int val ) { int count; /* Starting with: * 1100 - 1 == 1011, 1100 & 1011 == 1000 * 1000 - 1 == 0111, 1000 & 0111 == 0000 */ for ( count = 0; val; ++count ) val &= val - 1; return count; } /* Looks up each byte of the integer in a lookup table. * * The first time the function is called it initializes the lookup table. */ static int _tableBitCount( unsigned int val ) { int bCount = 0; if ( !_lookupTableInitialized ) { unsigned int i; for ( i = 0; i != UCHAR_MAX + 1; ++i ) _bitCountTable[i] = ( unsigned char )_defaultBitCount( i ); _lookupTableInitialized = 1; } for ( ; val; val >>= CHAR_BIT ) bCount += _bitCountTable[val & UCHAR_MAX]; return bCount; } static int ( *_bitcount ) ( unsigned int ) = _defaultBitCount; const char *strategyNames[] = { "onTheFly", "lookupTable" }; void setStrategy( enum strategy s ) { switch ( s ) { case onTheFly: _bitcount = _defaultBitCount; break; case lookupTable: _bitcount = _tableBitCount; break; case strategyCount: break; } } /* Just a forwarding function which will call whichever version * of the algorithm has been selected by the client */ int bitcount( unsigned int val ) { return _bitcount( val ); } #ifdef _BITCOUNT_EXE_ #include <stdio.h> #include <stdlib.h> #include <time.h> /* Use the same sequence of pseudo random numbers to benchmark each * Hamming Weight algorithm. */ void benchmark( int reps ) { clock_t start, stop; int i, j; static const int iterations = 1000000; for ( j = 0; j != strategyCount; ++j ) { setStrategy( j ); srand( 257 ); start = clock( ); for ( i = 0; i != reps * iterations; ++i ) bitcount( rand( ) ); stop = clock( ); printf ( "\n\t%d pseudo-random integers using %s: %f seconds\n\n", reps * iterations, strategyNames[j], ( double )( stop - start ) / CLOCKS_PER_SEC ); } } int main( void ) { int option; while ( 1 ) { printf( "Menu Options\n" "\t1.\tPrint the Hamming Weight of an Integer\n" "\t2.\tBenchmark Hamming Weight implementations\n" "\t3.\tExit ( or cntl-d )\n\n\t" ); if ( scanf( "%d", &option ) == EOF ) break; switch ( option ) { case 1: printf( "Please enter the integer: " ); if ( scanf( "%d", &option ) != EOF ) printf ( "The Hamming Weight of %d ( 0x%X ) is %d\n\n", option, option, bitcount( option ) ); break; case 2: printf ( "Please select number of reps ( in millions ): " ); if ( scanf( "%d", &option ) != EOF ) benchmark( option ); break; case 3: goto EXIT; break; default: printf( "Invalid option\n" ); } } EXIT: printf( "\n" ); return 0; } #endif 32-bit or not ? I just came with this method in Java after reading "cracking the coding interview" 4th edition exercice 5.5 ( chap 5: Bit Manipulation). If the least significant bit is 1 increment count, then right-shift the integer.
public static int bitCount( int n){ int count = 0; for (int i=n; i!=0; i = i >> 1){ count += i & 1; } return count; } I think this one is more intuitive than the solutions with constant 0x33333333 no matter how fast they are. It depends on your definition of "best algorithm" .
bitCount(), the for loop never terminates when n < 0.Naive Solution
Time Complexity is O(no. of bits in n)
int countSet(unsigned int n) { int res=0; while(n!=0){ res += (n&1); n >>= 1; // logical right shift, like C unsigned or Java >>> } return res; } Brian Kerningam's algorithm
Time Complexity is O(no of set bits in n)
int countSet(unsigned int n) { int res=0; while(n != 0) { n = (n & (n-1)); res++; } return res; } Lookup table method for 32-bit number- In this method we break the 32-bit number into chunks of four, 8-bit numbers
Time Complexity is O(1)
static unsigned char table[256]; /* the table size is 256, the number of values i&0xFF (8 bits) can have */ void initialize() //holds the number of set bits from 0 to 255 { table[0]=0; for(unsigned int i=1;i<256;i++) table[i]=(i&1)+table[i>>1]; } int countSet(unsigned int n) { // 0xff is hexadecimal representation of 8 set bits. int res=table[n & 0xff]; n=n>>8; res=res+ table[n & 0xff]; n=n>>8; res=res+ table[n & 0xff]; n=n>>8; res=res+ table[n & 0xff]; return res; } def hammingWeight(n: int) -> int: sums = 0 while (n!=0): sums+=1 n = n &(n-1) return sums In the binary representation, the least significant 1-bit in n always corresponds to a 0-bit in n - 1. Therefore, anding the two numbers n and n - 1 always flips the least significant 1-bit in n to 0, and keeps all other bits the same.
Personally I use this :
public static int myBitCount(long L){ int count = 0; while (L != 0) { count++; L ^= L & -L; } return count; } int countBits(int x) { int n = 0; if (x) do n++; while(x=x&(x-1)); return n; } Or also:
int countBits(int x) { return (x)? 1+countBits(x&(x-1)): 0; } 7 1/2 years after my original answer, @PeterMortensen questioned if this was even valid C syntax. I posted a link to an online compiler showing that it is in fact perfectly valid syntax (code below).
#include <stdio.h> int countBits(int x) { int n = 0; if (x) do n++; /* Totally Normal Valid code. */ while(x=x&(x-1)); /* Nothing to see here. */ return n; } int main(void) { printf("%d\n", countBits(25)); return 0; } 3 If you want to re-write it for clarity, it would look like:
if (x) { do { n++; } while(x=x&(x-1)); } But that seems excessive to my eye.
However, I've also realized the function can be made shorter, but perhaps more cryptic, written as:
int countBits(int x) { int n = 0; while (x) x=(n++,x&(x-1)); return n; } if (x) do n++; accepted? Does it actually compile?Here is a solution that has not been mentioned so far, using bitfields. The following program counts the set bits in an array of 100000000 16-bit integers using 4 different methods. Timing results are given in parentheses (on MacOSX, with gcc -O3):
#include <stdio.h> #include <stdlib.h> #define LENGTH 100000000 typedef struct { unsigned char bit0 : 1; unsigned char bit1 : 1; unsigned char bit2 : 1; unsigned char bit3 : 1; unsigned char bit4 : 1; unsigned char bit5 : 1; unsigned char bit6 : 1; unsigned char bit7 : 1; } bits; unsigned char sum_bits(const unsigned char x) { const bits *b = (const bits*) &x; return b->bit0 + b->bit1 + b->bit2 + b->bit3 \ + b->bit4 + b->bit5 + b->bit6 + b->bit7; } int NumberOfSetBits(int i) { i = i - ((i >> 1) & 0x55555555); i = (i & 0x33333333) + ((i >> 2) & 0x33333333); return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24; } #define out(s) \ printf("bits set: %lu\nbits counted: %lu\n", 8*LENGTH*sizeof(short)*3/4, s); int main(int argc, char **argv) { unsigned long i, s; unsigned short *x = malloc(LENGTH*sizeof(short)); unsigned char lut[65536], *p; unsigned short *ps; int *pi; /* set 3/4 of the bits */ for (i=0; i<LENGTH; ++i) x[i] = 0xFFF0; /* sum_bits (1.772s) */ for (i=LENGTH*sizeof(short), p=(unsigned char*) x, s=0; i--; s+=sum_bits(*p++)); out(s); /* NumberOfSetBits (0.404s) */ for (i=LENGTH*sizeof(short)/sizeof(int), pi=(int*)x, s=0; i--; s+=NumberOfSetBits(*pi++)); out(s); /* populate lookup table */ for (i=0, p=(unsigned char*) &i; i<sizeof(lut); ++i) lut[i] = sum_bits(p[0]) + sum_bits(p[1]); /* 256-bytes lookup table (0.317s) */ for (i=LENGTH*sizeof(short), p=(unsigned char*) x, s=0; i--; s+=lut[*p++]); out(s); /* 65536-bytes lookup table (0.250s) */ for (i=LENGTH, ps=x, s=0; i--; s+=lut[*ps++]); out(s); free(x); return 0; } While the bitfield version is very readable, the timing results show that it is over 4x slower than NumberOfSetBits(). The lookup-table based implementations are still quite a bit faster, in particular with a 65 kB table.
int bitcount(unsigned int n) { int count=0; while(n) { count += n & 0x1u; n >>= 1; } return count; } Iterated 'count' runs in time proportional to the total number of bits. It simply loops through all the bits, terminating slightly earlier because of the while condition. Useful, if 1'S or the set bits are sparse and among the least significant bits.
In Java 8 or 9 just invoke Integer.bitCount .
What are the algorithms to [popcount]?You can use built in function named __builtin_popcount(). There is no__builtin_popcount in C++ but it is a built in function of GCC compiler. This function return the number of set bit in an integer.
int __builtin_popcount (unsigned int x); Reference : Bit Twiddling Hacks
What are the algorithms to [popcount]?Another Hamming weight algorithm if you're on a BMI2 capable CPU:
the_weight = __tzcnt_u64(~_pext_u64(data[i], data[i])); popcnt. pext same,same to pack the bits could be an interesting building-block for something else, but tzcnt and pext both run on the same port as popcnt on Intel CPUs, and pext is very slow on AMD. (agner.org/optimize). You can sort of emulate pext x,x with (1ULL << popcnt(x)) - 1, except for the x==0 case. x86 shifts can't shift out all the bits, because they mask the shift count, and you have to watch out for C undefined behaviour with out of range counts.I am providing one more unmentioned algorithm, called Parallel, taken from here. The nice point about it that it is generic, meaning that the code is the same for bit sizes 8, 16, 32, 64, and 128.
I checked the correctness of its values and timings on an amount of 2^26 numbers for bits sizes 8, 16, 32, and 64. See the timings below.
This algorithm is a first code snippet. The other two are mentioned here just for reference, because I tested and compared to them.
Algorithms are coded in C++, to be generic, but it can be easily adopted to old C.
#include <type_traits> #include <cstdint> template <typename IntT> inline size_t PopCntParallel(IntT n) { // https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel using T = std::make_unsigned_t<IntT>; T v = T(n); v = v - ((v >> 1) & (T)~(T)0/3); // temp v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3); // temp v = (v + (v >> 4)) & (T)~(T)0/255*15; // temp return size_t((T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * 8); // count } Below are two algorithms that I compared with. One is the Kernighan simple method with a loop, taken from here.
template <typename IntT> inline size_t PopCntKernighan(IntT n) { // http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan using T = std::make_unsigned_t<IntT>; T v = T(n); size_t c; for (c = 0; v; ++c) v &= v - 1; // Clear the least significant bit set return c; } Another one is using built-in __popcnt16()/__popcnt()/__popcnt64() MSVC's intrinsic (doc here). Or __builtin_popcount of CLang/GCC (doc here). This intrinsic should provide a very optimized version, possibly hardware:
#ifdef _MSC_VER // https://learn.microsoft.com/en-us/cpp/intrinsics/popcnt16-popcnt-popcnt64?view=msvc-160 #include <intrin.h> #define popcnt16 __popcnt16 #define popcnt32 __popcnt #define popcnt64 __popcnt64 #else // https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html #define popcnt16 __builtin_popcount #define popcnt32 __builtin_popcount #define popcnt64 __builtin_popcountll #endif template <typename IntT> inline size_t PopCntBuiltin(IntT n) { using T = std::make_unsigned_t<IntT>; T v = T(n); if constexpr(sizeof(IntT) <= 2) return popcnt16(uint16_t(v)); else if constexpr(sizeof(IntT) <= 4) return popcnt32(uint32_t(v)); else if constexpr(sizeof(IntT) <= 8) return popcnt64(uint64_t(v)); else static_assert([]{ return false; }()); } Below are the timings, in nanoseconds per one number. All timings are done for 2^26 random numbers. Timings are compared for all three algorithms and all bit sizes among 8, 16, 32, and 64. In sum, all tests took 16 seconds on my machine. The high-resolution clock was used.
08 bit Builtin 8.2 ns 08 bit Parallel 8.2 ns 08 bit Kernighan 26.7 ns 16 bit Builtin 7.7 ns 16 bit Parallel 7.7 ns 16 bit Kernighan 39.7 ns 32 bit Builtin 7.0 ns 32 bit Parallel 7.0 ns 32 bit Kernighan 47.9 ns 64 bit Builtin 7.5 ns 64 bit Parallel 7.5 ns 64 bit Kernighan 59.4 ns 128 bit Builtin 7.8 ns 128 bit Parallel 13.8 ns 128 bit Kernighan 127.6 ns As one can see, the provided Parallel algorithm (first among three) is as good as MSVC's/CLang's intrinsic.
For reference, below is full code that I used to test speed/time/correctness of all functions.
As a bonus this code (unlike short code snippets above) also tests 128 bit size, but only under CLang/GCC (not MSVC), as they have unsigned __int128.
#include <type_traits> #include <cstdint> using std::size_t; #if defined(_MSC_VER) && !defined(__clang__) #define IS_MSVC 1 #else #define IS_MSVC 0 #endif #if IS_MSVC #define HAS128 false #else using int128_t = __int128; using uint128_t = unsigned __int128; #define HAS128 true #endif template <typename T> struct UnSignedT { using type = std::make_unsigned_t<T>; }; #if HAS128 template <> struct UnSignedT<int128_t> { using type = uint128_t; }; template <> struct UnSignedT<uint128_t> { using type = uint128_t; }; #endif template <typename T> using UnSigned = typename UnSignedT<T>::type; template <typename IntT> inline size_t PopCntParallel(IntT n) { // https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel using T = UnSigned<IntT>; T v = T(n); v = v - ((v >> 1) & (T)~(T)0/3); // temp v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3); // temp v = (v + (v >> 4)) & (T)~(T)0/255*15; // temp return size_t((T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * 8); // count } template <typename IntT> inline size_t PopCntKernighan(IntT n) { // http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan using T = UnSigned<IntT>; T v = T(n); size_t c; for (c = 0; v; ++c) v &= v - 1; // Clear the least significant bit set return c; } #if IS_MSVC // https://learn.microsoft.com/en-us/cpp/intrinsics/popcnt16-popcnt-popcnt64?view=msvc-160 #include <intrin.h> #define popcnt16 __popcnt16 #define popcnt32 __popcnt #define popcnt64 __popcnt64 #else // https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html #define popcnt16 __builtin_popcount #define popcnt32 __builtin_popcount #define popcnt64 __builtin_popcountll #endif #define popcnt128(x) (popcnt64(uint64_t(x)) + popcnt64(uint64_t(x >> 64))) template <typename IntT> inline size_t PopCntBuiltin(IntT n) { using T = UnSigned<IntT>; T v = T(n); if constexpr(sizeof(IntT) <= 2) return popcnt16(uint16_t(v)); else if constexpr(sizeof(IntT) <= 4) return popcnt32(uint32_t(v)); else if constexpr(sizeof(IntT) <= 8) return popcnt64(uint64_t(v)); else if constexpr(sizeof(IntT) <= 16) return popcnt128(uint128_t(v)); else static_assert([]{ return false; }()); } #include <random> #include <vector> #include <chrono> #include <string> #include <iostream> #include <iomanip> #include <map> inline double Time() { static auto const gtb = std::chrono::high_resolution_clock::now(); return std::chrono::duration_cast<std::chrono::duration<double>>( std::chrono::high_resolution_clock::now() - gtb).count(); } template <typename T, typename F> void Test(std::string const & name, F f) { std::mt19937_64 rng{123}; size_t constexpr bit_size = sizeof(T) * 8, ntests = 1 << 6, nnums = 1 << 14; std::vector<T> nums(nnums); for (size_t i = 0; i < nnums; ++i) nums[i] = T(rng() % ~T(0)); static std::map<size_t, size_t> times; double min_time = 1000; for (size_t i = 0; i < ntests; ++i) { double timer = Time(); size_t sum = 0; for (size_t j = 0; j < nnums; j += 4) sum += f(nums[j + 0]) + f(nums[j + 1]) + f(nums[j + 2]) + f(nums[j + 3]); auto volatile vsum = sum; min_time = std::min(min_time, (Time() - timer) / nnums); if (times.count(bit_size) && times.at(bit_size) != sum) std::cout << "Wrong bit cnt checksum!" << std::endl; times[bit_size] = sum; } std::cout << std::setw(2) << std::setfill('0') << bit_size << " bit " << name << " " << std::fixed << std::setprecision(1) << min_time * 1000000000 << " ns" << std::endl; } int main() { #define TEST(T) \ Test<T>("Builtin", PopCntBuiltin<T>); \ Test<T>("Parallel", PopCntParallel<T>); \ Test<T>("Kernighan", PopCntKernighan<T>); \ std::cout << std::endl; TEST(uint8_t); TEST(uint16_t); TEST(uint32_t); TEST(uint64_t); #if HAS128 TEST(uint128_t); #endif #undef TEST } #!/user/local/bin/perl $c=0x11BBBBAB; $count=0; $m=0x00000001; for($i=0;$i<32;$i++) { $f=$c & $m; if($f == 1) { $count++; } $c=$c >> 1; } printf("%d",$count); ive done it through a perl script. the number taken is $c=0x11BBBBAB B=3 1s A=2 1s so in total 1+1+3+3+3+2+3+3=19 Here is the sample code, which might be useful.
private static final int[] bitCountArr = new int[]{0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8}; private static final int firstByteFF = 255; public static final int getCountOfSetBits(int value){ int count = 0; for(int i=0;i<4;i++){ if(value == 0) break; count += bitCountArr[value & firstByteFF]; value >>>= 8; } return count; } For Java, there is a java.util.BitSet. https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html
cardinality(): Returns the number of bits set to true in this BitSet.
The BitSet is memory efficient since it's stored as a Long.
For those who want it in C++11 for any unsigned integer type as a consexpr function (tacklelib/include/tacklelib/utility/math.hpp):
#include <stdint.h> #include <limits> #include <type_traits> const constexpr uint32_t uint32_max = (std::numeric_limits<uint32_t>::max)(); namespace detail { template <typename T> inline constexpr T _count_bits_0(const T & v) { return v - ((v >> 1) & 0x55555555); } template <typename T> inline constexpr T _count_bits_1(const T & v) { return (v & 0x33333333) + ((v >> 2) & 0x33333333); } template <typename T> inline constexpr T _count_bits_2(const T & v) { return (v + (v >> 4)) & 0x0F0F0F0F; } template <typename T> inline constexpr T _count_bits_3(const T & v) { return v + (v >> 8); } template <typename T> inline constexpr T _count_bits_4(const T & v) { return v + (v >> 16); } template <typename T> inline constexpr T _count_bits_5(const T & v) { return v & 0x0000003F; } template <typename T, bool greater_than_uint32> struct _impl { static inline constexpr T _count_bits_with_shift(const T & v) { return detail::_count_bits_5( detail::_count_bits_4( detail::_count_bits_3( detail::_count_bits_2( detail::_count_bits_1( detail::_count_bits_0(v)))))) + count_bits(v >> 32); } }; template <typename T> struct _impl<T, false> { static inline constexpr T _count_bits_with_shift(const T & v) { return 0; } }; } template <typename T> inline constexpr T count_bits(const T & v) { static_assert(std::is_integral<T>::value, "type T must be an integer"); static_assert(!std::is_signed<T>::value, "type T must be not signed"); return uint32_max >= v ? detail::_count_bits_5( detail::_count_bits_4( detail::_count_bits_3( detail::_count_bits_2( detail::_count_bits_1( detail::_count_bits_0(v)))))) : detail::_impl<T, sizeof(uint32_t) < sizeof(v)>::_count_bits_with_shift(v); } Plus tests in google test library:
#include <stdlib.h> #include <time.h> namespace { template <typename T> inline uint32_t _test_count_bits(const T & v) { uint32_t count = 0; T n = v; while (n > 0) { if (n % 2) { count += 1; } n /= 2; } return count; } } TEST(FunctionsTest, random_count_bits_uint32_100K) { srand(uint_t(time(NULL))); for (uint32_t i = 0; i < 100000; i++) { const uint32_t r = uint32_t(rand()) + (uint32_t(rand()) << 16); ASSERT_EQ(_test_count_bits(r), count_bits(r)); } } TEST(FunctionsTest, random_count_bits_uint64_100K) { srand(uint_t(time(NULL))); for (uint32_t i = 0; i < 100000; i++) { const uint64_t r = uint64_t(rand()) + (uint64_t(rand()) << 16) + (uint64_t(rand()) << 32) + (uint64_t(rand()) << 48); ASSERT_EQ(_test_count_bits(r), count_bits(r)); } } Convert the integer to a binary string and count the ones.
PHP solution:
substr_count(decbin($integer), '1'); A simple algorithm to count the number of set bits:
int countbits(n) { int count = 0; while(n != 0) { n = n & (n-1); count++; } return count; } Take the example of 11 (1011) and try manually running through the algorithm. It should help you a lot!
Here is the functional master race recursive solution, and it is by far the purest one (and can be used with any bit length!):
template<typename T> int popcnt(T n) { if (n>0) return n&1 + popcnt(n>>1); return 0; } Kotlin pre 1.4
fun NumberOfSetBits(i: Int): Int { var i = i i -= (i ushr 1 and 0x55555555) i = (i and 0x33333333) + (i ushr 2 and 0x33333333) return (i + (i ushr 4) and 0x0F0F0F0F) * 0x01010101 ushr 24 } This is more or less a copy of the answer seen in the top answer.
It is with the Java fixes and is then converted using the converter in the IntelliJ IDEA Community Edition
1.4 and beyond (as of 2021-05-05 - it could change in the future).
fun NumberOfSetBits(i: Int): Int { return i.countOneBits() } Under the hood it uses Integer.bitCount as seen here:
@SinceKotlin("1.4") @WasExperimental(ExperimentalStdlibApi::class) @kotlin.internal.InlineOnly public actual inline fun Int.countOneBits(): Int = Integer.bitCount(this) I'll contribute to @Arty's answer
__popcnt16()/__popcnt()/__popcnt64()MSVC's intrinsic (doc here)
popcnt instruction, as noted in "Remarks" section, is available as part of SSE4 instruction set and there is a relatively high chance of it not being available.
If you run code that uses these intrinsics on hardware that doesn't support the popcnt instruction, the results are unpredictable.
So, you need to implement a check as per "Remarks" section:
To determine hardware support for the popcnt instruction, call the __cpuid intrinsic with InfoType=0x00000001 and check bit 23 of CPUInfo[2] (ECX). This bit is 1 if the instruction is supported, and 0 otherwise.
Here's how you do it:
unsigned popcnt(const unsigned input) { struct cpuinfo_t { union { int regs[4]; struct { long eax, ebx, ecx, edx; }; }; cpuinfo_t() noexcept : regs() {} } cpuinfo; // EAX=1: Processor Info and Feature Bits __cpuid(cpuinfo.regs, 1); // ECX bit 23: popcnt if (_bittest(&cpuinfo.ecx, 23)) { return __popcnt(input); } // Choose any fallback implementation you like, there's already a ton of them unsigned num = input; num = (num & 0x55555555) + (num >> 1 & 0x55555555); num = (num & 0x33333333) + (num >> 2 & 0x33333333); num = (num & 0x0F0F0F0F) + (num >> 4 & 0x0F0F0F0F); num = (num & 0x00FF00FF) + (num >> 8 & 0x00FF00FF); num = (num & 0x0000FFFF) + (num >> 16 & 0x0000FFFF); return num; } In C# what about an one liner:
BitOperations.PopCount(Mask); Returns the population count (number of bits set) of a mask. Similar in behavior to the x86 instruction POPCNT. Compatible with x64! It uses an intrinsic (built-in instruction of the X86 architecture) to count the number of bits very fast in a 32 bit or 64 bit value.
NOTE: BitOperations.PopCount() is not CLS compliant. Take this under consideration.
Cheers
I have not seen this approach anywhere:
int nbits(unsigned char v) { return ((((v - ((v >> 1) & 0x55)) * 0x1010101) & 0x30c00c03) * 0x10040041) >> 28; } (spelled out in post markdown (viewable via Edit))
It works per byte, so it would have to be called four times for a 32-bit integer. It is derived from the sideways addition, but it uses two 32-bit multiplications to reduce the number of instructions to only seven.
Most current C compilers will optimize this function using SIMD (SSE2) instructions when it is clear that the number of requests is a multiple of 4, and it becomes quite competitive.
It is portable, can be defined as a macro or inline function and does not need data tables.
This approach can be extended to work on 16 bits at a time, using 64-bit multiplications. However, it fails when all 16 bits are set, returning zero, so it can be used only when the 0xFFFF input value is not present.
It is also slower due to the 64-bit operations and does not optimize well.
Turns out that Hacker's Delight (Anderson) includes a solution with the same features, using only five instructions.
static uint32_t hd8(uint8_t v) { return ((((v * 0x8040201u) >> 3u) & 0x11111111u) * 0x11111111u) >> 28u; } set counter = 0.
repeat counting while N is not zero.
int countSetBits(unsigned int n){ int count = 0; while(n!=0){ count += n&1; n = n >>1; } return count; } Let's use this function.
int main(){ int x = 5; cout<<countSetBits(x); return 0; } Output: 2
Because 5 has 2 bits set in binary representation (101).
You can run the code here.
You can do something like:
int countSetBits(int n) { n=((n&0xAAAAAAAA)>>1) + (n&0x55555555); n=((n&0xCCCCCCCC)>>2) + (n&0x33333333); n=((n&0xF0F0F0F0)>>4) + (n&0x0F0F0F0F); n=((n&0xFF00FF00)>>8) + (n&0x00FF00FF); return n; } int main() { int n=10; printf("Number of set bits: %d",countSetBits(n)); return 0; } See heer: http://ideone.com/JhwcX
The working can be explained as follows:
First, all the even bits are shifted towards right & added with the odd bits to count the number of bits in group of two. Then we work in group of two, then four & so on..
n=1234667. You need to combine the two 8-bit chunks that the last step leaves, and clear the high bits. Also, it's apparently possible to be more efficient than this SWAR 1-bit, 2-bit, 4-bit sequence. I haven't grokked the magic of the top answer's bit-twiddling hack, though. stackoverflow.com/questions/109023/…