5

OK, guys, I know what I want to do, but I don't know if it already exists (as a function or theoretically) or how to phrase it, so I need your help :

  • Let's say we've got a binary number : (msb)10101110(lsb)
  • Starting from bit X, I want to zero-out all other bits (going left), as soon as the first zero bit is encountered.
  • Do that as fast as possible, with the absolute minimum number of operations and CPU cycles needed

An example :

  • Number = 10101110, Starting position = 1 (bit at place 1 = 1)
  • position++ - bit at place 2 = 1, keep going
  • position++ - bit at place 3 = 1, keep going
  • position++ - bit at place 4 = 0, oops... zero encountered... now, everything has to be zero-ed out.

So, the final result of our imaginary function CROPLEFT(X,POS), where X=10101110, and POS=1, would return 00001110.


Any ideas?

5
  • @MitchWheat Trust me, I've covered a whole notebook full of sketches for a wide variety of extreme bitmap manipulations. The thing is all I can think of this one includes a loop (which I most definitely would prefer to avoid). Commented Dec 15, 2012 at 5:16
  • @MitchWheat He says it's probably not the fastest way. And I agree with him. Commented Dec 15, 2012 at 5:18
  • @MitchWheat This is going to be performed some million times, so it'll definitely kill speed, that's why. Commented Dec 15, 2012 at 5:19
  • @MitchWheat Per second. (along with a dozen other calculations) :-) Commented Dec 15, 2012 at 5:20
  • 1
    @MitchWheat OK, no need to be secretive: It's part of my move-generation algorithm for a chess engine project of mine. So, if you've ever played with chess programming and bitmaps, you know what I'm talking about... ;-) Commented Dec 15, 2012 at 5:23

4 Answers 4

12

Piece of cake.

y = ~x; // We like working with 1's, not 0's. y &= -y; // Mask off all but the lowest-set bit x &= y-1; // Make a mask for the bits below that and apply it. 

and with the position parameter added:

y = ~x & -1U<<pos; // Change 1U to a larger type if needed. y &= -y; x &= y-1; 

The key ingredient is the second line and the fact that you can replace a value y with just its lowest set bit by applying logical and against -y. Sadly there's no such luck for getting the highest-set bit, unless you have a special cpu instruction for it, so you're lucky that your problem called for lowest.

Sign up to request clarification or add additional context in comments.

16 Comments

I think you're missing the "starting at position X" part. Although it's a trivial change.
Crazy! Was that so simple? Yep, an explanation would be ideal; I'll test it and I'll accept it is asap.
@rici: Yes, I missed that detail. Is it okay to leave it as an exercise for OP? :-)
Yeah, need an extra shift somewhere in there. :P
I went ahead and just added it.
|
3

OK, what the heck:

return x & ((x ^ (x + (1UL << POS))) | ((1UL << POS) - 1)) 

For what it's worth, both of them compiled with gcc-4.7 -O3. R..'s on the left, mine on the right: (using unsigned long and 1UL in both of them)

 .p2align 4,,15 .p2align 4,,15 .globl zapleft .globl zapleft2 .type zapleft, @function .type zapleft2, @function zapleft: zapleft2: .LFB0: .LFB1: .cfi_startproc .cfi_startproc movl %esi, %ecx movl %esi, %ecx movq %rdi, %rax movl $1, %edx movq $-1, %rdx salq %cl, %rdx salq %cl, %rdx leaq (%rdx,%rdi), %rax notq %rax subq $1, %rdx andq %rax, %rdx xorq %rdi, %rax movq %rdx, %rax orq %rdx, %rax negq %rax andq %rdi, %rax andq %rdx, %rax ret subq $1, %rax .cfi_endproc andq %rdi, %rax .LFE1: ret .size zapleft2, .-zapleft2 .cfi_endproc .LFE0: .size zapleft, .-zapleft 

4 Comments

I kinda cheated scooping you by only reading half the problem before I answered... ;-)
for x=10101110 and POS=1, this returns 10100000 (actually the opposite thing)
OK, just re-tested it. And I confirm : it works. Guess I'll have to run some profiling tests to see which version runs faster... Thanks a million for the all the effort! :-)
@Dr.Kameleon: I compared my version with R..'s, as a sanity check, using pos=2 and pos=31, for all 32-bit integers (so a total of 2^34 calls, 2^33 to each function). It took 17 seconds, presumably split roughly evenly between the two. So that's about a thousand million (inlined) function calls per second. That should be fast enough for you, no?
1
CROPLEFT(int X,int POS) { int mask = 1 << POS; while (X & mask) mask <<= 1; return (X & (mask - 1)); } 

1 Comment

This is pretty much identical to what I had written myself, but decided to avoid, because of that loop thing. Thanks a lot buddy, anyway... :-)
0

replace trailing zeroes by ones:

x = x | (x-1); 

replace trailing ones by zeroes:

x = x & (x+1); 

EDIT: Oups, it seems I misread the question, above code zeroes the right bits, not the left bits !

To zero the left bit, we would need a final XOR operation:

y = x | (x-1); y = y & (y+1); y = x ^ y; 

EDIT 2 About the start position POS

We just have to zero out the rightmost POS bits in a first step.

y = x & (-1U<<pos); y = y | (y-1); y = y & (y+1); y = x ^ y; 

EDIT 3 Above solution ignore the first group of zeroes if they are encountered at POS.
If this does not answer the question, then the code will be shorter, but very much like the one of rci now:

y = x | ((1U<<pos)-1); // fill trailing positions with ones y = y & (y+1); // replace trailing ones by zeroes y = x ^ y; // modify leading bits rather than trailing ones 

2 Comments

Since I've got the checker from yesterday, I checked yours, too. It works :) It's ten instructions, one longer than mine but two shorter than @R's
@rici our code are almost the same, my additional step is probably a miss-interpretation of the question because i ignore first group zeroes encountered at POS

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.