It’s coroutines, with thread pools. Thread pool is a fairly straight forward work stealing queue. The coroutines reduce busy wait (polling). SPSC loc free power of 2 bounded queue for mailbox communication. There is a single consumer thread and a single IO thread. A buffer pool of free, full (ready to write) buffers and an active buffer.
For previous versions I was using a MPMC lock free power of two bounded queue. The busy wait, and cache ping pong due to the atomics in the queue implementation was challenging to control.
With coroutines and TP the producers are mapped to fewer threads on the TP and yield instead of polling and exponential back off etc. the CPU utilization is now for real work.
The throughput should be limited by IO, earlier it was limited at the CPU level.
Next, I …
It’s coroutines, with thread pools. Thread pool is a fairly straight forward work stealing queue. The coroutines reduce busy wait (polling). SPSC loc free power of 2 bounded queue for mailbox communication. There is a single consumer thread and a single IO thread. A buffer pool of free, full (ready to write) buffers and an active buffer.
For previous versions I was using a MPMC lock free power of two bounded queue. The busy wait, and cache ping pong due to the atomics in the queue implementation was challenging to control.
With coroutines and TP the producers are mapped to fewer threads on the TP and yield instead of polling and exponential back off etc. the CPU utilization is now for real work.
The throughput should be limited by IO, earlier it was limited at the CPU level.
Next, I want add fsync/fdatasync to simulate real WAL loads. The throughput will come down significantly and that is the elected you can’t write faster than the disk specification. What was bothering me as that I couldn’t reach the disk write speed before even writing to the disk.