The billion row challenge: do we have a bug? (opens in new tab)
A couple of people contacted me with feedback about the SIMD implementation we used to find newlines in the billion-row file. One suggested a possible bug (shock!) and one suggested a way that might be more efficient. We'll take a look at both, obviously starting by writing a test that checks for the bug. After the stream I made another attempt at using the information about all newlines in a 64-byte chunk, instead of just the first one. I did it with no Vecs at all, unifying the two function...
Read the original article