A 100-line async request coalescer for batched embedding inference (opens in new tab)
Embedding inference is several times faster in batches, but callers want to send one item at a time. The bridge is a small async primitive that holds requests for a few milliseconds and dispatches them together.
Read the original article