Accelerating Copy_if Using SIMD (opens in new tab)
Introduction I have a Zen 4 CPU with a bunch of AVX512 feature flags. So I thought - let’s try and use it to implement something, even if it’s in the realm of wheel-reinvention. I started with the following goals. Implement an algorithm that cannot be vectorized by my optimizing compiler, even with a polyhedral loop model. Systematically analyze its performance and answer the questions Is it as fast as it can be? If not, why? And how can we fix it? Start simple, make it work. Which means that...
Read the original article