- 06 Nov, 2025 *
In this blogpost we make a connection between the operation of bit counting and a nice properties of unsigned binary numbers.
An identity
Consider x to be an unsigned strictly positive number. Let j be the first entry of our binary number that is unequal to 0, i.e. we can write it in binary notation as x=xn−1...xj+110...0.
Consider
x−1=∑j≤i≤n−1xi2i−1=2j+∑j<i≤n−1xi2i−1
We have the geometric series:
∑i∈{0,...,j−1}2i=1−2j1−2=2j−1
Rearranging the terms yields:
2j−1=∑i∈{0,...,j−1}2i
Plugging that into above equation we obtain:
x−1=∑i∈{0,...,n−1}yi2i
where yi=1 for i<j , yj=0 and yi=xi else, i.e. we can write as binary number x−1=xn−1...xj+101...1.
If we perform an bitwise AND of this number with x we will obtain
x∧x−1=xn−…
- 06 Nov, 2025 *
In this blogpost we make a connection between the operation of bit counting and a nice properties of unsigned binary numbers.
An identity
Consider x to be an unsigned strictly positive number. Let j be the first entry of our binary number that is unequal to 0, i.e. we can write it in binary notation as x=xn−1...xj+110...0.
Consider
x−1=∑j≤i≤n−1xi2i−1=2j+∑j<i≤n−1xi2i−1
We have the geometric series:
∑i∈{0,...,j−1}2i=1−2j1−2=2j−1
Rearranging the terms yields:
2j−1=∑i∈{0,...,j−1}2i
Plugging that into above equation we obtain:
x−1=∑i∈{0,...,n−1}yi2i
where yi=1 for i<j , yj=0 and yi=xi else, i.e. we can write as binary number x−1=xn−1...xj+101...1.
If we perform an bitwise AND of this number with x we will obtain
x∧x−1=xn−1...xj+10...0. which is simply x with the rightmost one removed. As we will see below this is a useful identity when one wants to count the number of one bits in a number.
Naive Code
A simple way to count the number of bits is simply to always extract the rightmost bit by performing bitwise and with 1. We’ll than right-shift the bits and continue until only 0 bits are left.
#include <assert.h>
#include <stdint.h>
#define uint uint32_t
int bitcount(uint x);
int main(void)
{
uint x = 10; // 2^3 + 2^1 -> 0b0...01010
assert(bitcount(x) == 2);
}
int bitcount(uint x)
{
int b;
for(b = 0; x != 0; x >>= 1) {
if(x & 01)
b++;
}
return b;
}
The corresponding assembly can be obtained by inputting the code into godbolt.
main:
xor eax,eax
ret
data16 data16 data16 cs nop WORD PTR [rax+rax*1+0x0]
bitcount:
xor eax,eax
test edi,edi
je 1160 <bitcount+0x20>
mov ecx,edi
nop DWORD PTR [rax+rax*1+0x0]
mov edx,edi
and edx,0x1
add eax,edx
shr ecx,1
cmp edi,0x1
mov edi,ecx
ja 1150 <bitcount+0x10>
ret
Note that the above algorithm has obvious weakness because it runs as many times as the position we have the leftmost one bit at. That means a number like 0b10...0 will make 32 comparisons and 31 shifts.
Using the identity
Using above identity we can do better: In each round we simply delete the rightmost one. We are done once we have only zero bits left.
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
#define uint uint32_t
int bitcount(uint x);
int main(void)
{
uint x = 10; // 2^3 + 2^1 -> 0b0...01010
assert(bitcount(x) == 2);
}
int bitcount(uint x)
{
int b;
b = 0;
for(b = 0; x != 0; x &= (x-1))
b++;
return b;
}
We can see that we need less instructions.
main:
xor eax,eax
ret
data16 data16 data16 cs nop WORD PTR [rax+rax*1+0x0]
bitcount:
xor eax,eax
test edi,edi
je 115b <bitcount+0x1b>
cs nop WORD PTR [rax+rax*1+0x0]
inc eax
lea ecx,[rdi-0x1]
and ecx,edi
mov edi,ecx
jne 1150 <bitcount+0x10>
ret
Note that not only have we better looking assembly but also we perform less loops. For example in the same case like above we will only perform one loop. That means on sparse bit sequences the second algorithm will perform much better.
Popcount
Note that depending on our processor we may leverage popcnt instruction like so:
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#define uint uint32_t
int bitcount(uint x);
int main(){
uint x = 10;
assert(bitcount(x) == 2);
return 0;
}
int bitcount(uint x)
{
return __builtin_popcount(x);
}
The assembly looks as follows:
main:
xor eax,eax
ret
data16 data16 data16 cs nop WORD PTR [rax+rax*1+0x0]
bitcount:
popcnt eax,edi
ret
where we need to provide -mpopcnt to compiler to let the compiler know that popcnt instruction is available. This way we don’t need any bit hacks and have a single instruction to perform our operation. See here for corresponding godbolt link.
Conclusion
I hope this blogpost served as a good introduction to bit counting and how we can use elementary results from Math to improve the naive algorithm. If you like to exchange ideas on GPU programming or low level programming in general you can contact me on my Linkedin. The above observation is inspired by an exercise of the K&R book on C programming. I intend to solve a large portion of the K&R exercises in the next weeks and you can check out the solutions on my Github.