169

While there are multiple ways to reverse bit order in a byte, I'm curious as to what is the "simplest" for a developer to implement. And by reversing I mean:

1110 -> 0111 0010 -> 0100 

This is similar to, but not a duplicate of this PHP question.

This is similar to, but not a duplicate of this C question. This question is asking for the easiest method to implement by a developer. The "Best Algorithm" is concerned with memory and cpu performance.

7
  • Use inline assembly. Better, put the function into a separate translation unit. Have one assembly language module for each target platform. Let build process choose the modules. Commented Apr 8, 2010 at 19:47
  • @Andreas Simplest implementation Commented Apr 8, 2010 at 20:12
  • 2
    Fun fact: ARM / AArch64 have an instruction for this, rbit. I don't know of a standard way to use it, though, not even a GCC or clang intrinsic. See How can I elegantly take advantage of ARM instructions like REV and RBIT when writing C code? in case future answers have a better way. Commented Jan 10, 2023 at 8:24
  • 2
    A lookup table would probably be the simplest. Commented May 30, 2023 at 1:31
  • 3
    @NateEldredge P3104R2: Bit permutations Commented Feb 4 at 6:43

41 Answers 41

323

This should work:

unsigned char reverse(unsigned char b) { b = (b & 0xF0) >> 4 | (b & 0x0F) << 4; b = (b & 0xCC) >> 2 | (b & 0x33) << 2; b = (b & 0xAA) >> 1 | (b & 0x55) << 1; return b; } 

First the left four bits are swapped with the right four bits. Then all adjacent pairs are swapped and then all adjacent single bits. This results in a reversed order.

Sign up to request clarification or add additional context in comments.

14 Comments

Reasonably short and quick, but not simple.
This approach also cleanly generalizes to perform byte swapping for endianness.
Not the simplest approach, but I like it +1.
Yes, it is simple. It's a kind of divide and conquer algorithm. Excellent!
@PiotrSiupa: Yes, Clang 13 and later recognizes this algorithms SWAR divide-and-conquer as an rbit idiom for ARM and AArch64: godbolt.org/z/14MM3cExW . Also for bit-at-a-time loops, although -O3 is required for 32-bit integers, but -O2 handles 8-bit (fewer iterations to unroll and analyze.)
|
164

I think a lookup table has to be one of the simplest methods. However, you don't need a full lookup table.

//Index 1==0b0001 => 0b1000 //Index 7==0b0111 => 0b1110 //etc static unsigned char lookup[16] = { 0x0, 0x8, 0x4, 0xc, 0x2, 0xa, 0x6, 0xe, 0x1, 0x9, 0x5, 0xd, 0x3, 0xb, 0x7, 0xf, }; uint8_t reverse(uint8_t n) { // Reverse the top and bottom nibble then swap them. return (lookup[n&0b1111] << 4) | lookup[n>>4]; } // Detailed breakdown of the math // + lookup reverse of bottom nibble // | + grab bottom nibble // | | + move bottom result into top nibble // | | | + combine the bottom and top results // | | | | + lookup reverse of top nibble // | | | | | + grab top nibble // V V V V V V // (lookup[n&0b1111] << 4) | lookup[n>>4] 

This fairly simple to code and verify visually.
Ultimately this might even be faster than a full table. The bit arith is cheap and the table easily fits on a cache line.

12 Comments

Nice, but will give you a cache miss.
@kotlinski: what will cause a cache miss? I think the small table version may be more cache efficient than the large one. On my Core2 a cache line is 64 bytes wide, the full table would span multiple lines, whereas the smaller table easily fits one a single line.
@kotlinski, I suppose that could be a problem, the table is small, but if it's called infrequently it'll have to cache load the table every time. An alternative implementation would be to use a switch to encode the table instead an array. Either way I don't think it will measurably effect performance.
@kotlinski: Temporal locality is more important for cache hits or replacement strategies, than address locality
@Harshdeep: Consider the binary encoded indexes of the table entries. index b0000(0) -> b0000(0x0) boring; b0001(1) -> b1000(0x8), b0010(2) -> b0100(0x4), b1010(10) -> b0101(0x5). See the pattern? It is simple enough that you can calculate it in your head (if you can read binary, otherwise you'll need paper to work it out). As for the leap that reversing an 8 bit integer is the same as reversing 4 bit parts then swapping them; I claim experience and intuition (or magic).
|
112

If you are talking about a single byte, a table-lookup is probably the best bet, unless for some reason you don't have 256 bytes available.

15 Comments

If we're talking about something that's simple to implement without copying a ready-made solution, creating the lookup table does still require another solution. (Of course one might do it by hand, but that's error-prone and time-consuming…)
You can squeeze the array into somewhat fewer than 256 bytes if you ignore palindromes.
@wilhelmtell - you'd need a table to know which ones are the palindromes.
@wilhelmtell: Well, to write the script one still needs another solution, which was my point – a lookup table is simple to use but not simple to create. (Except by copying a ready-made lookup table, but then one might just as well copy any solution.) For example, if the “simplest” solution is considered one that could be written on paper in an exam or interview, I would not start making a lookup table by hand and making the program to do it would already include a different solution (which would be simpler alone than the one including both it and the table).
@Arkku what I meant is write a script which outputs the table of the first 256 bytes and their reverse mapping. Yes, you're back to writing the reverse function, but now in your favourite scripting language, and it can be as nasty as you want -- you're going to throw it away as soon as it's done and you ran it once. Have the script's output as C code, even: unsigned int rtable[] = {0x800, 0x4000, ...};. Then throw away the script and forget you ever had it. It's much faster to write than the equivalent C++ code, and it will only ever run once, so you get O(1) runtime in your C++ code.
|
63

Since nobody posted a complete table lookup solution, here is mine:

unsigned char reverse_byte(unsigned char x) { static const unsigned char table[] = { 0x00, 0x80, 0x40, 0xc0, 0x20, 0xa0, 0x60, 0xe0, 0x10, 0x90, 0x50, 0xd0, 0x30, 0xb0, 0x70, 0xf0, 0x08, 0x88, 0x48, 0xc8, 0x28, 0xa8, 0x68, 0xe8, 0x18, 0x98, 0x58, 0xd8, 0x38, 0xb8, 0x78, 0xf8, 0x04, 0x84, 0x44, 0xc4, 0x24, 0xa4, 0x64, 0xe4, 0x14, 0x94, 0x54, 0xd4, 0x34, 0xb4, 0x74, 0xf4, 0x0c, 0x8c, 0x4c, 0xcc, 0x2c, 0xac, 0x6c, 0xec, 0x1c, 0x9c, 0x5c, 0xdc, 0x3c, 0xbc, 0x7c, 0xfc, 0x02, 0x82, 0x42, 0xc2, 0x22, 0xa2, 0x62, 0xe2, 0x12, 0x92, 0x52, 0xd2, 0x32, 0xb2, 0x72, 0xf2, 0x0a, 0x8a, 0x4a, 0xca, 0x2a, 0xaa, 0x6a, 0xea, 0x1a, 0x9a, 0x5a, 0xda, 0x3a, 0xba, 0x7a, 0xfa, 0x06, 0x86, 0x46, 0xc6, 0x26, 0xa6, 0x66, 0xe6, 0x16, 0x96, 0x56, 0xd6, 0x36, 0xb6, 0x76, 0xf6, 0x0e, 0x8e, 0x4e, 0xce, 0x2e, 0xae, 0x6e, 0xee, 0x1e, 0x9e, 0x5e, 0xde, 0x3e, 0xbe, 0x7e, 0xfe, 0x01, 0x81, 0x41, 0xc1, 0x21, 0xa1, 0x61, 0xe1, 0x11, 0x91, 0x51, 0xd1, 0x31, 0xb1, 0x71, 0xf1, 0x09, 0x89, 0x49, 0xc9, 0x29, 0xa9, 0x69, 0xe9, 0x19, 0x99, 0x59, 0xd9, 0x39, 0xb9, 0x79, 0xf9, 0x05, 0x85, 0x45, 0xc5, 0x25, 0xa5, 0x65, 0xe5, 0x15, 0x95, 0x55, 0xd5, 0x35, 0xb5, 0x75, 0xf5, 0x0d, 0x8d, 0x4d, 0xcd, 0x2d, 0xad, 0x6d, 0xed, 0x1d, 0x9d, 0x5d, 0xdd, 0x3d, 0xbd, 0x7d, 0xfd, 0x03, 0x83, 0x43, 0xc3, 0x23, 0xa3, 0x63, 0xe3, 0x13, 0x93, 0x53, 0xd3, 0x33, 0xb3, 0x73, 0xf3, 0x0b, 0x8b, 0x4b, 0xcb, 0x2b, 0xab, 0x6b, 0xeb, 0x1b, 0x9b, 0x5b, 0xdb, 0x3b, 0xbb, 0x7b, 0xfb, 0x07, 0x87, 0x47, 0xc7, 0x27, 0xa7, 0x67, 0xe7, 0x17, 0x97, 0x57, 0xd7, 0x37, 0xb7, 0x77, 0xf7, 0x0f, 0x8f, 0x4f, 0xcf, 0x2f, 0xaf, 0x6f, 0xef, 0x1f, 0x9f, 0x5f, 0xdf, 0x3f, 0xbf, 0x7f, 0xff, }; return table[x]; } 

4 Comments

Useful, thanks. Seems that my slower shifting method was limiting performance in an embedded app. Placed table in ROM on a PIC (with addition of rom keyword).
How did you get this table? What is the algorithm to generate this table?
@Lance Just count from 0 to 255 and reverse each byte by any known method. For example, isolate each bit, move it into its destination, and then combine the moved bits back together. That's a classic bitwise-and, bit-shift, bitwise-or application.
51

See the bit twiddling hacks for many solutions. Copypasting from there is obviously simple to implement. =)

For example (on a 32-bit CPU):

uint8_t b = byte_to_reverse; b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16; 

If by “simple to implement” one means something that can be done without a reference in an exam or job interview, then the safest bet is probably the inefficient copying of bits one by one into another variable in reverse order (already shown in other answers).

3 Comments

From your URL: 32 bit CPU: b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
@Joshua: That's my personal favourite as well. The caveat (as stated on the linked page) is that it needs to be assigned or cast into an uint8_t or there will be garbage in the upper bits.
The x86 instructions tzcnt and bzhi can be used to first compute floor(log_2(x + 2)) and zero the bits above the MSB computed from tzcnt respectively, so a cast isn't too expensive in the one-off case. On AMD Ryzen Family 17h, tzcnt and bzhi are both one cycle. They are supported on both AMD and Intel platforms that have the x86 BMI extensions released in 2013 CPUs.
44

There are many ways to reverse bits depending on what you mean the "simplest way".


Reverse by Rotation

Probably the most logical, consists in rotating the byte while applying a mask on the first bit (n & 1):

unsigned char reverse_bits(unsigned char b) { unsigned char r = 0; unsigned byte_len = 8; while (byte_len--) { r = (r << 1) | (b & 1); b >>= 1; } return r; } 
  1. As the length of an unsigner char is 1 byte, which is equal to 8 bits, it means we will scan each bit while (byte_len--)

  2. We first check if b as a bit on the extreme right with (b & 1); if so we set bit 1 on r with | and move it just 1 bit to the left by multiplying r by 2 with (r << 1)

  3. Then we divide our unsigned char b by 2 with b >>=1 to erase the bit located at the extreme right of variable b. As a reminder, b >>= 1; is equivalent to b /= 2;


Reverse in One Line

This solution is attributed to Rich Schroeppel in the Programming Hacks section

unsigned char reverse_bits3(unsigned char b) { return (b * 0x0202020202ULL & 0x010884422010ULL) % 0x3ff; } 
  1. The multiply operation (b * 0x0202020202ULL) creates five separate copies of the 8-bit byte pattern to fan-out into a 64-bit value.

  2. The AND operation (& 0x010884422010ULL) selects the bits that are in the correct (reversed) positions, relative to each 10-bit groups of bits.

  3. Together the multiply and the AND operations copy the bits from the original byte so they each appear in only one of the 10-bit sets. The reversed positions of the bits from the original byte coincide with their relative positions within any 10-bit set.

  4. The last step (% 0x3ff), which involves modulus division by 2^10 - 1 has the effect of merging together each set of 10 bits (from positions 0-9, 10-19, 20-29, ...) in the 64-bit value. They do not overlap, so the addition steps underlying the modulus division behave like OR operations.


Divide and Conquer Solution

unsigned char reverse(unsigned char b) { b = (b & 0xF0) >> 4 | (b & 0x0F) << 4; b = (b & 0xCC) >> 2 | (b & 0x33) << 2; b = (b & 0xAA) >> 1 | (b & 0x55) << 1; return b; } 

This is the most upvoted answer and despite some explanations, I think that for most people it feels difficult to visualize whats 0xF0, 0xCC, 0xAA, 0x0F, 0x33 and 0x55 truly means.

It does not take advantage of '0b' which is a GCC extension and is included since the C++14 standard, release in December 2014, so a while after this answer dating from April 2010

Integer constants can be written as binary constants, consisting of a sequence of ‘0’ and ‘1’ digits, prefixed by ‘0b’ or ‘0B’. This is particularly useful in environments that operate a lot on the bit level (like microcontrollers).

Please check below code snippets to remember and understand even better this solution where we move half by half:

unsigned char reverse(unsigned char b) { b = (b & 0b11110000) >> 4 | (b & 0b00001111) << 4; b = (b & 0b11001100) >> 2 | (b & 0b00110011) << 2; b = (b & 0b10101010) >> 1 | (b & 0b01010101) << 1; return b; } 

NB: The >> 4 is because there are 8 bits in 1 byte, which is an unsigned char so we want to take the other half, and so on.

We could easily apply this solution to 4 bytes with only two additional lines and following the same logic. Since both mask complement each other we can even use ~ in order to switch bits and saving some ink:

uint32_t reverse_integer_bits(uint32_t b) { uint32_t mask = 0b11111111111111110000000000000000; b = (b & mask) >> 16 | (b & ~mask) << 16; mask = 0b11111111000000001111111100000000; b = (b & mask) >> 8 | (b & ~mask) << 8; mask = 0b11110000111100001111000011110000; b = (b & mask) >> 4 | (b & ~mask) << 4; mask = 0b11001100110011001100110011001100; b = (b & mask) >> 2 | (b & ~mask) << 2; mask = 0b10101010101010101010101010101010; b = (b & mask) >> 1 | (b & ~mask) << 1; return b; } 

[C++ Only] Reverse Any Unsigned (Template)

The above logic can be summarized with a loop that would work on any type of unsigned:

template <class T> T reverse_bits(T n) { short bits = sizeof(n) * 8; T mask = ~T(0); // equivalent to uint32_t mask = 0b11111111111111111111111111111111; while (bits >>= 1) { mask ^= mask << (bits); // will convert mask to 0b00000000000000001111111111111111; n = (n & ~mask) >> bits | (n & mask) << bits; // divide and conquer } return n; } 

C++ 17 only

You may use a table that store the reverse value of each byte with (i * 0x0202020202ULL & 0x010884422010ULL) % 0x3ff, initialized through a lambda (you will need to compile it with g++ -std=c++1z since it only works since C++17), and then return the value in the table will give you the accordingly reversed bit:

#include <cstdint> #include <array> uint8_t reverse_bits(uint8_t n) { static constexpr array<uint8_t, 256> table{[]() constexpr{ constexpr size_t SIZE = 256; array<uint8_t, SIZE> result{}; for (size_t i = 0; i < SIZE; ++i) result[i] = (i * 0x0202020202ULL & 0x010884422010ULL) % 0x3ff; return result; }()}; return table[n]; } 

main.cpp

Try it yourself with inclusion of above function:

#include <stdint.h> #include <stdio.h> #include <stdlib.h> template <class T> void print_binary(T n) { T mask = 1ULL << ((sizeof(n) * 8) - 1); // will set the most significant bit for(; mask != 0; mask >>= 1) putchar('0' | !!(n & mask)); putchar('\n'); } int main() { uint32_t n = 12; print_binary(n); n = reverse_bits(n); print_binary(n); unsigned char c = 'a'; print_binary(c); c = reverse_bits(c); print_binary(c); uint16_t s = 12; print_binary(s); s = reverse_bits(s); print_binary(s); uint64_t l = 12; print_binary(l); l = reverse_bits(l); print_binary(l); return 0; } 

Reverse with asm volatile

Last but not least, if simplest means fewer lines, why not give a try to inline assembly?

You can test below code snippet by adding -masm=intel when compiling:

unsigned char reverse_bits(unsigned char c) { __asm__ __volatile__ (R"( mov cx, 8 daloop: ror di adc ax, ax dec cx jnz short daloop ;)"); } 

Explanations line by line:

 mov cx, 8 ; we will reverse the 8 bits contained in one byte daloop: ; while loop shr di ; Shift Register `di` (containing value of the first argument of callee function) to the Right rcl ax ; Rotate Carry Left: rotate ax left and add the carry from shr di, the carry is equal to 1 if one bit was "lost" from previous operation dec cl ; Decrement cx jnz short daloop; Jump if cx register is Not equal to Zero, else end loop and return value contained in ax register 

5 Comments

Section [C++ Only] Reverse Any Unsigned (Template): Note that if the type template parameter T is signed, mask will represent a negative number and applying << on a negative signed integer yields undefined-behavior. Even if T is unsigned the result of the first shift operation on mask is implementation-defined
Your x86-64 inline asm is unsafe; missing clobbers and input/output operands to tell the compiler which registers are inputs and outputs. Written that way, you need __attribute__((naked)), and a ret instruction at the bottom. And BTW, this is x86-64 as for the AMD64 System V calling convention. You just said "assembly". In ARM or AArch64 assembly you'd use rbit. Clang 13 and later does recognize some algorithms (loop or SWAR divide-and-conquer) as rbit idioms: godbolt.org/z/14MM3cExW at least with -O3 for 32-bit.
It's strange to use 16-bit operand size for inline asm that looks to be assuming the x86-64 System V calling convention. shr edx, 1 is a slightly more efficient way to shift a bit into CF; ror has weird partial-FLAGS semantics, and adc eax, eax would be better than 16-bit. And dec ecx/jnz is more compact than dec cx/jnz. For performance, you might want to xor eax,eax before the loop to avoid a false dependency, or to be able to return a zero-extended 32-bit value. (An actual unsigned char ret val can have garbage in the high 24 / 56 bits of a reg, that's not a bug.)
It's pure arithmetic, there's no need for the asm to be volatile.
That "Reverse in One Line" solution is just beautiful. Very creative use of the instructions
30
template <typename T> T reverse(T n, size_t b = sizeof(T) * CHAR_BIT) { assert(b <= std::numeric_limits<T>::digits); T rv = 0; for (size_t i = 0; i < b; ++i, n >>= 1) { rv = (rv << 1) | (n & 0x01); } return rv; } 

EDIT:

Converted it to a template with the optional bitcount

5 Comments

For extra pedenatry, replace sizeof(T)*8 with sizeof(T)*CHAR_BITS.
@Pillsy - Sure, why not... it's not like anybody could actually be too pedantic, now could they?
@andand For extra extra pendantry, replace sizeof(T)*CHAR_BIT by std::numeric_limits<T>::digits (almost 4 years of pedantry later).
It should be CHAR_BIT, not CHAR_BITS.
it should be rv = (rv << 1) | (n & 0x01);
18

Two lines:

for(i=0;i<8;i++) reversed |= ((original>>i) & 0b1)<<(7-i); 

or in case you have issues with the "0b1" part:

for(i=0;i<8;i++) reversed |= ((original>>i) & 1)<<(7-i); 

"original" is the byte you want to reverse. "reversed" is the result, initialized to 0.

Comments

16

Although probably not portable, I would use assembly language.
Many assembly languages have instructions to rotate a bit into the carry flag and to rotate the carry flag into the word (or byte).

The algorithm is:

for each bit in the data type: rotate bit into carry flag rotate carry flag into destination. end-for 

The high level language code for this is much more complicated, because C and C++ do not support rotating to carry and rotating from carry. The carry flag has to modeled.

Edit: Assembly language for example

; Enter with value to reverse in R0. ; Assume 8 bits per byte and byte is the native processor type. LODI, R2 8 ; Set up the bit counter Loop: RRC, R0 ; Rotate R0 right into the carry bit. RLC, R1 ; Rotate R1 left, then append carry bit. DJNZ, R2 Loop ; Decrement R2 and jump if non-zero to "loop" LODR, R0 R1 ; Move result into R0. 

3 Comments

I think this answer is the opposite of simple. Non-portable, assembly, and complex enough to be written in pseudo-code instead of the actual assembly.
It is quite simple. I put it into pseudo-code because assembly mnemonics are specific to a breed of processor and there are a lot of breeds out there. If you would like, I can edit this to show the simple assembly language.
One could see if a compiler optimization simplifies into a suitable assembly instruction.
15

I find the following solution simpler than the other bit fiddling algorithms I've seen in here.

unsigned char reverse_byte(char a) { return ((a & 0x1) << 7) | ((a & 0x2) << 5) | ((a & 0x4) << 3) | ((a & 0x8) << 1) | ((a & 0x10) >> 1) | ((a & 0x20) >> 3) | ((a & 0x40) >> 5) | ((a & 0x80) >> 7); } 

It gets every bit in the byte, and shifts it accordingly, starting from the first to the last.

Explanation:

 ((a & 0x1) << 7) //get first bit on the right and shift it into the first left position | ((a & 0x2) << 5) //add it to the second bit and shift it into the second left position //and so on 

1 Comment

This is certainly simple, but it should be pointed out that the execution time is O(n) rather than O(log₂ n), where n is the number of bits (8, 16, 32, 64, etc.).
10

The simplest way is probably to iterate over the bit positions in a loop:

unsigned char reverse(unsigned char c) { int shift; unsigned char result = 0; for (shift = 0; shift < CHAR_BIT; shift++) { if (c & (0x01 << shift)) result |= (0x80 >> shift); } return result; } 

2 Comments

it's CHAR_BIT, without an 's'
Why use CHAR_BIT when you assume char to have 8 bits?
7

For the very limited case of constant, 8-bit input, this method costs no memory or CPU at run-time:

#define MSB2LSB(b) (((b)&1?128:0)|((b)&2?64:0)|((b)&4?32:0)|((b)&8?16:0)|((b)&16?8:0)|((b)&32?4:0)|((b)&64?2:0)|((b)&128?1:0)) 

I used this for ARINC-429 where the bit order (endianness) of the label is opposite the rest of the word. The label is often a constant, and conventionally in octal.

Here's how I used it to define a constant, because the spec defines this label as big-endian 205 octal.

#define LABEL_HF_COMM MSB2LSB(0205) 

More examples:

assert(0b00000000 == MSB2LSB(0b00000000)); assert(0b10000000 == MSB2LSB(0b00000001)); assert(0b11000000 == MSB2LSB(0b00000011)); assert(0b11100000 == MSB2LSB(0b00000111)); assert(0b11110000 == MSB2LSB(0b00001111)); assert(0b11111000 == MSB2LSB(0b00011111)); assert(0b11111100 == MSB2LSB(0b00111111)); assert(0b11111110 == MSB2LSB(0b01111111)); assert(0b11111111 == MSB2LSB(0b11111111)); assert(0b10101010 == MSB2LSB(0b01010101)); 

Comments

6

You may be interested in std::vector<bool> (that is bit-packed) and std::bitset

It should be the simplest as requested.

#include <iostream> #include <bitset> using namespace std; int main() { bitset<8> bs = 5; bitset<8> rev; for(int ii=0; ii!= bs.size(); ++ii) rev[bs.size()-ii-1] = bs[ii]; cerr << bs << " " << rev << endl; } 

Other options may be faster.

EDIT: I owe you a solution using std::vector<bool>

#include <algorithm> #include <iterator> #include <iostream> #include <vector> using namespace std; int main() { vector<bool> b{0,0,0,0,0,1,0,1}; reverse(b.begin(), b.end()); copy(b.begin(), b.end(), ostream_iterator<int>(cerr)); cerr << endl; } 

The second example requires c++0x extension (to initialize the array with {...}). The advantage of using a bitset or a std::vector<bool> (or a boost::dynamic_bitset) is that you are not limited to bytes or words but can reverse an arbitrary number of bits.

HTH

5 Comments

How is bitset any simpler than a pod here? Show the code, or it isn't.
Actu ally, I think that code will reverse the bitset, and then reverse it back to its original. Change ii != size(); to ii < size()/2; and it'll do a better job =)
(@viktor-sehr no, it will not, rev is different from bs). Anyway I don't like the answer myself: I think this is a case where binary arithmetic and shift operators are better suited. It still remains the simplest to understand.
How about std::vector<bool> b = { ... }; std::vector<bool> rb ( b.rbegin(), b.rend()); - using reverse iterators directly?
@MSalters I like the immutability of it.
3

Table lookup or

uint8_t rev_byte(uint8_t x) { uint8_t y; uint8_t m = 1; while (m) { y >>= 1; if (m&x) { y |= 0x80; } m <<=1; } return y; } 

edit

Look here for other solutions that might work better for you

Comments

3

a slower but simpler implementation:

static int swap_bit(unsigned char unit) { /* * swap bit[7] and bit[0] */ unit = (((((unit & 0x80) >> 7) ^ (unit & 0x01)) << 7) | (unit & 0x7f)); unit = (((((unit & 0x80) >> 7) ^ (unit & 0x01))) | (unit & 0xfe)); unit = (((((unit & 0x80) >> 7) ^ (unit & 0x01)) << 7) | (unit & 0x7f)); /* * swap bit[6] and bit[1] */ unit = (((((unit & 0x40) >> 5) ^ (unit & 0x02)) << 5) | (unit & 0xbf)); unit = (((((unit & 0x40) >> 5) ^ (unit & 0x02))) | (unit & 0xfd)); unit = (((((unit & 0x40) >> 5) ^ (unit & 0x02)) << 5) | (unit & 0xbf)); /* * swap bit[5] and bit[2] */ unit = (((((unit & 0x20) >> 3) ^ (unit & 0x04)) << 3) | (unit & 0xdf)); unit = (((((unit & 0x20) >> 3) ^ (unit & 0x04))) | (unit & 0xfb)); unit = (((((unit & 0x20) >> 3) ^ (unit & 0x04)) << 3) | (unit & 0xdf)); /* * swap bit[4] and bit[3] */ unit = (((((unit & 0x10) >> 1) ^ (unit & 0x08)) << 1) | (unit & 0xef)); unit = (((((unit & 0x10) >> 1) ^ (unit & 0x08))) | (unit & 0xf7)); unit = (((((unit & 0x10) >> 1) ^ (unit & 0x08)) << 1) | (unit & 0xef)); return unit; } 

Comments

3

Can this be fast solution?

int byte_to_be_reversed = ((byte_to_be_reversed>>7)&0x01)|((byte_to_be_reversed>>5)&0x02)| ((byte_to_be_reversed>>3)&0x04)|((byte_to_be_reversed>>1)&0x08)| ((byte_to_be_reversed<<7)&0x80)|((byte_to_be_reversed<<5)&0x40)| ((byte_to_be_reversed<<3)&0x20)|((byte_to_be_reversed<<1)&0x10); 

Gets rid of the hustle of using a for loop! but experts please tell me if this is efficient and faster?

1 Comment

This has execution time is O(n) rather than O(log₂ n), where n is the number of bits (8, 16, 32, 64, etc.). See elsewhere for answers that execute in O(log₂ n) time.
3

Here is a simple and readable solution, portable to all conformant platforms, including those with sizeof(char) == sizeof(int):

#include <limits.h> unsigned char reverse(unsigned char c) { int shift; unsigned char result = 0; for (shift = 0; shift < CHAR_BIT; shift++) { result <<= 1; result |= c & 1; c >>= 1; } return result; } 

Comments

2

Before implementing any algorithmic solution, check the assembly language for whatever CPU architecture you are using. Your architecture may include instructions which handle bitwise manipulations like this (and what could be simpler than a single assembly instruction?).

If such an instruction is not available, then I would suggest going with the lookup table route. You can write a script/program to generate the table for you, and the lookup operations would be faster than any of the bit-reversing algorithms here (at the cost of having to store the lookup table somewhere).

1 Comment

Notably, modern ARM and AArch64 have rbit. (And yes, if supported, it's efficient, not microcoded). x86 doesn't, although you can use SSSE3 pshufb as a lookup table for nibbles, bit-reversing up to 16 bytes in parallel. I forget if RISC-V extension-b (bit-manipulation) has bit-reverse.
2

This one is based on the one BobStein-VisiBone provided

#define reverse_1byte(b) ( ((uint8_t)b & 0b00000001) ? 0b10000000 : 0 ) | \ ( ((uint8_t)b & 0b00000010) ? 0b01000000 : 0 ) | \ ( ((uint8_t)b & 0b00000100) ? 0b00100000 : 0 ) | \ ( ((uint8_t)b & 0b00001000) ? 0b00010000 : 0 ) | \ ( ((uint8_t)b & 0b00010000) ? 0b00001000 : 0 ) | \ ( ((uint8_t)b & 0b00100000) ? 0b00000100 : 0 ) | \ ( ((uint8_t)b & 0b01000000) ? 0b00000010 : 0 ) | \ ( ((uint8_t)b & 0b10000000) ? 0b00000001 : 0 ) 

I really like this one a lot because the compiler automatically handle the work for you, thus require no further resources.

this can also be extended to 16-Bits...

#define reverse_2byte(b) ( ((uint16_t)b & 0b0000000000000001) ? 0b1000000000000000 : 0 ) | \ ( ((uint16_t)b & 0b0000000000000010) ? 0b0100000000000000 : 0 ) | \ ( ((uint16_t)b & 0b0000000000000100) ? 0b0010000000000000 : 0 ) | \ ( ((uint16_t)b & 0b0000000000001000) ? 0b0001000000000000 : 0 ) | \ ( ((uint16_t)b & 0b0000000000010000) ? 0b0000100000000000 : 0 ) | \ ( ((uint16_t)b & 0b0000000000100000) ? 0b0000010000000000 : 0 ) | \ ( ((uint16_t)b & 0b0000000001000000) ? 0b0000001000000000 : 0 ) | \ ( ((uint16_t)b & 0b0000000010000000) ? 0b0000000100000000 : 0 ) | \ ( ((uint16_t)b & 0b0000000100000000) ? 0b0000000010000000 : 0 ) | \ ( ((uint16_t)b & 0b0000001000000000) ? 0b0000000001000000 : 0 ) | \ ( ((uint16_t)b & 0b0000010000000000) ? 0b0000000000100000 : 0 ) | \ ( ((uint16_t)b & 0b0000100000000000) ? 0b0000000000010000 : 0 ) | \ ( ((uint16_t)b & 0b0001000000000000) ? 0b0000000000001000 : 0 ) | \ ( ((uint16_t)b & 0b0010000000000000) ? 0b0000000000000100 : 0 ) | \ ( ((uint16_t)b & 0b0100000000000000) ? 0b0000000000000010 : 0 ) | \ ( ((uint16_t)b & 0b1000000000000000) ? 0b0000000000000001 : 0 ) 

1 Comment

I would put the b in parentheses in case it is a more complex expression than a single number, and perhaps also rename the macro to REVERSE_BYTE as a hint that you probably don't want to have a more complex (runtime) expression there. Or make it an inline function. (But overall I like this as being simple enough that you could do it from memory easily with very little chance of error.)
2

This simple function uses a mask to test each bit in the input byte and transfer it into a shifting output:

char Reverse_Bits(char input) { char output = 0; for (unsigned char mask = 1; mask > 0; mask <<= 1) { output <<= 1; if (input & mask) output |= 1; } return output; } 

1 Comment

Mask should be unsigned sorry.
2

Assuming that your compiler allows unsigned long long:

unsigned char reverse(unsigned char b) { return (b * 0x0202020202ULL & 0x010884422010ULL) % 1023; } 

Discovered here

Comments

2

This is a similar method to sth's excellent answer, but with optimizations, support for up to 64-bit integers, and other small improvements.

I utilize a C++ template function reverse_bits() to let the compiler optimize for various word sizes of integers which might be passed to the function. The function should work correctly with any word size that is a multiple of 8 bits, up to a maximum of 64 bits. If your compiler supports words longer than 64 bits, the method is straightforward to extend.

This a complete, ready-to-compile example with the requisite headers. There is a convenient template function to_binary_str() for creating a std::string representation of binary numbers, along with a few calls with various word sizes to demonstrate everything.

If you remove the comments and blank lines, the function is quite compact and visually pleasing.

You can try out it on labstack here.

// this is the only header used by the reverse_bits() function #include <type_traits> // these headers are only used by demonstration code #include <string> #include <iostream> #include <cstdint> template<typename T> T reverse_bits( T n ) { // we force the passed-in type to its unsigned equivalent, because C++ may // perform arithmetic right shift instead of logical right shift, depending // on the compiler implementation. typedef typename std::make_unsigned<T>::type unsigned_T; unsigned_T v = (unsigned_T)n; // swap every bit with its neighbor v = ((v & 0xAAAAAAAAAAAAAAAA) >> 1) | ((v & 0x5555555555555555) << 1); // swap every pair of bits v = ((v & 0xCCCCCCCCCCCCCCCC) >> 2) | ((v & 0x3333333333333333) << 2); // swap every nybble v = ((v & 0xF0F0F0F0F0F0F0F0) >> 4) | ((v & 0x0F0F0F0F0F0F0F0F) << 4); // bail out if we've covered the word size already if( sizeof(T) == 1 ) return v; // swap every byte v = ((v & 0xFF00FF00FF00FF00) >> 8) | ((v & 0x00FF00FF00FF00FF) << 8); if( sizeof(T) == 2 ) return v; // etc... v = ((v & 0xFFFF0000FFFF0000) >> 16) | ((v & 0x0000FFFF0000FFFF) << 16); if( sizeof(T) <= 4 ) return v; v = ((v & 0xFFFFFFFF00000000) >> 32) | ((v & 0x00000000FFFFFFFF) << 32); // explictly cast back to the original type just to be pedantic return (T)v; } template<typename T> std::string to_binary_str( T n ) { const unsigned int bit_count = sizeof(T)*8; char s[bit_count+1]; typedef typename std::make_unsigned<T>::type unsigned_T; unsigned_T v = (unsigned_T)n; for( int i = bit_count - 1; i >= 0; --i ) { if( v & 1 ) s[i] = '1'; else s[i] = '0'; v >>= 1; } s[bit_count] = 0; // string null terminator return s; } int main() { { char x = 0xBA; std::cout << to_binary_str( x ) << std::endl; char y = reverse_bits( x ); std::cout << to_binary_str( y ) << std::endl; } { short x = 0xAB94; std::cout << to_binary_str( x ) << std::endl; short y = reverse_bits( x ); std::cout << to_binary_str( y ) << std::endl; } { uint64_t x = 0xFEDCBA9876543210; std::cout << to_binary_str( x ) << std::endl; uint64_t y = reverse_bits( x ); std::cout << to_binary_str( y ) << std::endl; } return 0; } 

Comments

2

If there is one wondering how to get uint16_t or uint32_t bit reversed by using the solution from Rich Schroeppel in the Programming Hacks section

unsigned char reverse_bits3(unsigned char b) { return (b * 0x0202020202ULL & 0x010884422010ULL) % 0x3ff; } 

It can be done by following divide-and-conquer code on X86/X64 platform (little-endian):

uint16_t reverse_bits3(uint16_t s) { uint8_t &b0 = ((uint8_t*)&s)[0]; uint8_t &b1 = ((uint8_t*)&s)[1]; return (((uint16_t)reverse_bits3(b0))<<8) + reverse_bits3(b1); } uint32_t reverse_bits3(uint32_t u) { uint16_t &s0 = ((uint16_t*)&u)[0]; uint16_t &s1 = ((uint16_t*)&u)[1]; return (((uint16_t)reverse_bits3(s0))<<16) + reverse_bits3(s1); } 

Comments

1
typedef struct { uint8_t b0:1; uint8_t b1:1; uint8_t b2:1; uint8_t b3:1; uint8_t b4:1; uint8_t b5:1; uint8_t b6:1; uint8_t b7:1; } bits_t; uint8_t reverse_bits(uint8_t src) { uint8_t dst = 0x0; bits_t *src_bits = (bits_t *)&src; bits_t *dst_bits = (bits_t *)&dst; dst_bits->b0 = src_bits->b7; dst_bits->b1 = src_bits->b6; dst_bits->b2 = src_bits->b5; dst_bits->b3 = src_bits->b4; dst_bits->b4 = src_bits->b3; dst_bits->b5 = src_bits->b2; dst_bits->b6 = src_bits->b1; dst_bits->b7 = src_bits->b0; return dst; } 

1 Comment

As a stylistic note, I find the use of uint8_t for the 1-bit fields a bit ugly, since it seems to first say that it will take 8 bits but then at the end of the line defines it as only a single bit. I would use unsigned b0:1 etc.
1

If you using small microcontroller and need high speed solution with small footprint, this could be solutions. It is possible to use it for C project, but you need to add this file as assembler file *.asm, to your C project. Instructions: In C project add this declaration:

extern uint8_t byte_mirror(uint8_t); 

Call this function from C

byteOutput= byte_mirror(byteInput); 

This is the code, it is only suitable for 8051 core. In the CPU register r0 is data from byteInput. Code rotate right r0 cross carry and then rotate carry left to r1. Repeat this procedure 8 times, for every bit. Then the register r1 is returned to c function as byteOutput. In 8051 core is only posibble to rotate acumulator a.

NAME BYTE_MIRROR RSEG RCODE PUBLIC byte_mirror //8051 core byte_mirror mov r3,#8; loop: mov a,r0; rrc a; mov r0,a; mov a,r1; rlc a; mov r1,a; djnz r3,loop mov r0,a ret 

PROS: It is small footprint, it is high speed CONS: It is not reusable code, it is only for 8051

011101101->carry

101101110<-carry

1 Comment

While this code may answer the question, it would be better to include some context, explaining how it works and when to use it. Code-only answers are not useful in the long run.
1

With the help of various online resources, i jotted these for myself (not sure if they're 100% accurate) :

# octal hex # bit-orig : 01234567 01234567:89ABCDEF # bit-invert : 76543210 FEDCBA98:76543210 # # clz : 32110000 43221111:00000000 # clo/ffs : 00001123 00000000:11112234 

bit-reverse : [ 0 4 2 6 1 5 3 7 ] [ 0 8 4 C 2 A 6 E 1 9 5 D 3 B 7 F ]

# cto : 01020103 01020103:01020104 # ctz : 30102010 40102010:30102010 

but this is mostly only convenient if your input is already either hex or octal.

In both formats (8 or 16), you'll notice that after the bit-reflections, all the even number indices are all on the first half. I've also highlighted the same 0-7 on the hex side to help with the visualization of it.

in fact, one doesn't even have to do a double substring. The lookup string can be either used as seeking the letter needed, or simply use it as an index lookup. this is how i reflect the CRC32 polynomial myself :

(z is the input polynomial (or just any hex string) xn = 0 ^ (x = length(z)); # initialize to numeric 0, # foo^bar in awk means # foo-to-bar-th-power. # same as foo**bar in other langs y = substr(_REF_bitREV_hex, 2); # by pre-trimming the lookup str, # it allows skipping the + 1 at # every cycle of the loop do { xn *= 16 xn += index(y, substr(z,x,1)) # keep in mind that this is awk syntax, # where strings start at index-1, not zero. } while ( 1 < x—- ); 

One advantage of using a hex- or octal- based approach is that it allows for inputs of any length, enabling arbitrary precision operation without having to use a proper BigInteger or BigFloat library. For that, you'll have to substring out the new digit/letter and do string concats instead of simply adding each time.

Comments

1

This is the easiest approach to remember to me as a developer:

unsigned char reverse_bits(unsigned char octet) { return (((octet >> 0) & 1) << 7) | \ (((octet >> 1) & 1) << 6) | \ (((octet >> 2) & 1) << 5) | \ (((octet >> 3) & 1) << 4) | \ (((octet >> 4) & 1) << 3) | \ (((octet >> 5) & 1) << 2) | \ (((octet >> 6) & 1) << 1) | \ (((octet >> 7) & 1) << 0); } 

Comments

0

I'll chip in my solution, since i can't find anything like this in the answers so far. It is a bit overengineered maybe, but it generates the lookup table using C++14 std::index_sequence in compile time.

#include <array> #include <utility> constexpr unsigned long reverse(uint8_t value) { uint8_t result = 0; for (std::size_t i = 0, j = 7; i < 8; ++i, --j) { result |= ((value & (1 << j)) >> j) << i; } return result; } template<size_t... I> constexpr auto make_lookup_table(std::index_sequence<I...>) { return std::array<uint8_t, sizeof...(I)>{reverse(I)...}; } template<typename Indices = std::make_index_sequence<256>> constexpr auto bit_reverse_lookup_table() { return make_lookup_table(Indices{}); } constexpr auto lookup = bit_reverse_lookup_table(); int main(int argc) { return lookup[argc]; } 

https://godbolt.org/z/cSuWhF

Comments

0

I know that this question is dated but I still think that the topic is relevant for some purposes, and here is a version that works very well and is readable. I can not say that it is the fastest or the most efficient, but it ought to be one of the cleanest. I have also included a helper function for easily displaying the bit patterns. This function uses some of the standard library functions instead of writing your own bit manipulator.

#include <algorithm> #include <bitset> #include <exception> #include <iostream> #include <limits> #include <string> // helper lambda function template template<typename T> auto getBits = [](T value) { return std::bitset<sizeof(T) * CHAR_BIT>{value}; }; // Function template to flip the bits // This will work on integral types such as int, unsigned int, // std::uint8_t, 16_t etc. I did not test this with floating // point types. I chose to use the `bitset` here to convert // from T to string as I find it easier to use than some of the // string to type or type to string conversion functions, // especially when the bitset has a function to return a string. template<typename T> T reverseBits(T& value) { static constexpr std::uint16_t bit_count = sizeof(T) * CHAR_BIT; // Do not use the helper function in this function! auto bits = std::bitset<bit_count>{value}; auto str = bits.to_string(); std::reverse(str.begin(), str.end()); bits = std::bitset<bit_count>(str); return static_cast<T>( bits.to_ullong() ); } // main program int main() { try { std::uint8_t value = 0xE0; // 1110 0000; std::cout << +value << '\n'; // don't forget to promote unsigned char // Here is where I use the helper function to display the bit pattern auto bits = getBits<std::uint8_t>(value); std::cout << bits.to_string() << '\n'; value = reverseBits(value); std::cout << +value << '\n'; // + for integer promotion // using helper function again... bits = getBits<std::uint8_t>(value); std::cout << bits.to_string() << '\n'; } catch(const std::exception& e) { std::cerr << e.what(); return EXIT_FAILURE; } return EXIT_SUCCESS; } 

And it gives the following output.

224 11100000 7 00000111 

Comments

0

This one helped me with 8x8 dot matrix set of arrays.

uint8_t mirror_bits(uint8_t var) { uint8_t temp = 0; if ((var & 0x01))temp |= 0x80; if ((var & 0x02))temp |= 0x40; if ((var & 0x04))temp |= 0x20; if ((var & 0x08))temp |= 0x10; if ((var & 0x10))temp |= 0x08; if ((var & 0x20))temp |= 0x04; if ((var & 0x40))temp |= 0x02; if ((var & 0x80))temp |= 0x01; return temp; } 

2 Comments

This function doesn't actually work, the reverse of 0b11001111 should be 0b11110011, but fails with this function. The same testing method works for many of the other functions listed here.
Yep, thanks I corrected my answer. Thanks for the letting me know about my mistake :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.