A 100-line async request coalescer for batched embedding inference (opens in new tab)

Discussed on r/Python

Embedding inference is several times faster in batches, but callers want to send one item at a time. The bridge is a small async primitive that holds requests for a few milliseconds and dispatches them together.

Read the original article