7

Maybe you can help me with the following problem that can help me speed a memory manager I am thinking of (I am not sure a solution exists – I did not find one).

I have a 32 bits register and I need to find if there are n consecutive set bits in it, and if so what is their offset. For example if the register holds the following value 111100000000000000000001111111000 and n equals to 4 – any of the following answer is accepted (offsets starts from 0):

3, 4, 5, 6, 28

The atomic operations I have are all the regular bitwise operations (&, |, ~, …) and also finding the least significant bit offset (3 in the register above). The algorithm (assuming one exists) – should take no more than 5 atomic operations.

8
  • 1
    Make a mask that has the last n bits set, then shift it and match it. Easy. Commented Aug 21, 2012 at 11:13
  • @daa - I didnt know how to accept answers until recently. I tried only loops , but have no idea how to do it without it. Commented Aug 21, 2012 at 11:16
  • @Qnan - it is much more than 5 operation.. it's better to loop Commented Aug 21, 2012 at 11:16
  • I know a better way, that only looks at runs of 1's, but it's still more than 5 operations and it behaves terribly when the bits alternate between 0 and 1. Commented Aug 21, 2012 at 11:18
  • It's not homework , I can do a simple loop and find the needed offset. But I truly have no idea if it can be done in O(1) Commented Aug 21, 2012 at 11:34

5 Answers 5

5

If there is an algorithm that does that, then the worst case complexity is at least O(m-n), where m is a the number of bits in the register and n is the number of consecutive set bits you are looking for. This is easy to see because if all bits are set, your algorithm will have to output exactly m-n items, so it's complexity cannot be any lower.

EDIT

There's an elegant solution to a similar problem here Looping through bits in an integer, ruby, finding the length of the longes 1 sequence.

If you know the length n of the run you're looking for in advance, this algorithm will require only n steps. The offset can then be recovered from the number of trailing zeroes in the pre-last step of the algorithm in about 5 more steps. That's not extremely efficient, but probably better than the loop-through solution, especially for a small n.

EDIT 2

If the n is known in advance, we can figure out a sequence of necesary shifts for it. E.g. if we are looking for 7 bit runs, then we'd have to do

x &= x >> 1 x &= x >> 3 x &= x >> 1 x &= x >> 1 

The point is that we shift right n/2 bits if n is even or by 1 if n is odd, then update n accordingly (either n = n - 1 or n = n / 2), as @harold suggests. Estimating these values on the fly would be expensive, but if we pre-calculate them then it's going to be pretty efficient.

EDIT 3

Even better, for any n, exactly ceil(log(2,n)) steps would be required, no matter which shift we take, as long as it is between floor(n/2) and 2^floor(log(2,n-1)). See comments below.

Sign up to request clarification or add additional context in comments.

9 Comments

It only has to output one item though - the question says any of those offsets are ok, not that it has to find all of them.
but it's enough to find one offset - I don't need all of them - the minute it finds one - it will print it out and that's it..
the sample for n==7 is broken, try it on 63 for instance. The lengths that can be optimized are those that are not prime numbers.
@salva Check the update. I think it's about odd vs even numbers, rather than prime or composite ones. Think about 9. It's going to be 1,4,2,1, I think. Or 11, which corresponds to 1,5,1,2,1.
@Qnan, I think the optimum relation is both more complex than looking for prime numbers or for odds and evens. I think it goes more with finding the biggest power of two p = (1<<k) smaller than n and then calculating x & (x >> (n - p)). For instance in the n=7 case, the optimum is x &= x >> 1; x &= x >> 2; x &= x >> 3. For n=9, it is x &= x >> 1; x &= x >> 2; x &= x >> 4; x &= x >> 1
|
2

Apologies for raising this from the dead, but I actually needed the generic algo where you know the number of bits, M, ahead of time.

The comments in a very good answer: (https://stackoverflow.com/a/12053749/2963099) led me to realize that a C++ recursive solution would be sufficient to calculate a O(log(M)) where M is the number of consecutive bits being searched as follows:

#include <bit> #include <cstdint> template <int BITS> struct consecutive_bits { static int ones(uint64_t b) { b &= b >> BITS/2; return consecutive_bits<BITS-BITS/2>::ones(b); } static int zeros(uint64_t b) { return ones(~b); } }; template <> struct consecutive_bits<1> { static int ones(uint64_t b) { return std::countr_zero(b); } static int zeros(uint64_t b) { return ones(~b); } }; 

(You can see it on Compiler Explorer: https://godbolt.org/z/9PWbYWYoW )

The asm is quite simple, using 7 as an example (and setting -march=haswell):

mov rax, rdi shr rax, 3 and rax, rdi mov rdx, rax shr rdx, 2 and rdx, rax mov rax, rdx shr rax and rax, rdx tzcnt rax, rax ret 

From the line breaks I added, it seems clear that it is O(M) with K being 3 asm lines: MOV, SHR, AND

Comments

1

for every possible byte value (0-255) calculate the number of bits at the beginning, the number of bits at the end and the longest number of consecutive bits inside the byte and the offset of this sequence. For instance, for 0b11011101, there are 2 bits at the beginning, 1 bit at the end and a sequence of 3 consecutive bits in it.

Store this values in 4 arrays, for instance start, end, longest, longest_offset.

Then, consider the 32bit number as a 4 bytes array and iterate over these bytes as follows:

int search_bit_sequence(uint32 num, int desired) { unsigned char *bytes = (unsigned char *)&num; int i, acu; for (acu = i = 0; i < 4; i++) { int byte = bytes[i]; acu += start[byte]; if (acu >= desired) return (i * 8 - (acu - start[byte])); if (longest[byte] >= desired) return ( i * 8 + longest_offset[byte]); if (longest[byte] < 8) acu = end[byte]; } return -1; /* not found */ } 

update: notice that the endianess of your CPU may require changing the loop direction.

3 Comments

this would be ok for searching a longer bit array, but i think it's a real overshoot if you're only concerned with one register
Did you maybe mean 0b11011101? Hex 0x11011101 doesn't have any consecutive set bits, just some groups of 0001 nibbles (0x1) and some groups of 0000 (0x0).
@PeterCordes, yes, you are right!
1

The link posted by Qnan shows an elegant solution to the general case.

For particular values of m it could be further optimized.

For instance, for m == 4, you can just do:

x &= (x >> 1); x &= (x >> 2); // at this point, the first bit set in x indicates a 4 bit set sequence. 

For m == 6 :

x &= (x >> 1); x &= (x >> 1); x &= (x >> 3); 

In the end, that just reduces to factoring m.

update

Note also, that for high values of, it may actually be cheaper to just check for the bit sequence at every possible position.

For instance, for m = 23, the pattern can only start at positions from 0 to 9.

2 Comments

I think you mixed up the order, should be x >> 2 first and then x >> 1 for a 4.
@Qnan: I think order doesn't matter at all.
1

I checked this question and answers and came up with following idea.

int i = n-1; uint32_t y = x; while(y && i--) { y = y & (y << 1); }; 

After above operation y is nonzero if there is n consecutive set bits. The next thing to do is to find the least significant value set there. The following code will strip all the set bits except the least significant.

z = y - (y & (y-1)); 

Now that we have only one bit set, we need to find the position of the bit. We can use switch statement with 32 cases.

static inline int get_set_position(const uint32_t z) { switch(z) { case 0x1: return 0; case 0x2: return 1; .... .... // upto (1<<31) total 32 times. } return -1; } 

Finally to get the result we need to reduce n-1. So the total procedure is the following.

static inline int get_set_n_position(const uint32_t x, const uint8_t n) { if(!n) return -1; int i = n-1; uint32_t y = x; while(y && i--) { y = y & (y << 1); }; if(!y) return -1; uint32_t z = y - (y & (y-1)); if(!z) return -1; int pos = get_set_position(z); if(pos < 0) return -1; assert(pos >= (n-1)); return pos - (n-1); } 

Now there is concern for big-endian. I think I just need to change the get_set_position() for big-endian to make it work(assuming the consecutive set bits definition changes based on endian-ness).

Let me share a tested code that uses builtin_ctzl provided by gcc.

OPP_INLINE int get_set_n_position(BITSTRING_TYPE x, const uint8_t n) { if(!n || n > BIT_PER_STRING) return -1; int i = n-1; while(x && i--) { x = x & (x << 1); }; if(!x) return -1; int pos = __builtin_ctzl(x); return pos - (n-1); } 

The code works in O(1) time because 32 is constant(as @Qnan noticed). Again it works in O(n) if the size of register vary.

Note: I fixed the bugs, thanks to comments and unit-testing.

4 Comments

Your initial assertion is false. If x = 11011011 and n=3 Then x << (n-1) is 01101100. 11011011 & 01101100 = 01001000. The result is non-zero and yet there are not 3 consecutive set bits.
@JimMischel Thanks, I fixed the bug and updated the answer now.
That doesn't work, either. x = 11011011. x << n = 11011000 The result of a bitwise AND is 11011000.
@JimMischel thanks, It was a mistake. Now I fixed and tested the code.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.