Fergus's blog fergusfinn.com
Large-Scale Semantic Search Without Embeddings
fergusfinn.com·1w
Parallel Primitives for Multi-Agent Workflows
fergusfinn.com·1w
How fast can an LLM go?
fergusfinn.com·11w
Control Layer Benchmarking
fergusfinn.com·12w
The Doubleword Control Layer
fergusfinn.com·12w
LLM guided scheduling
fergusfinn.com·16w
Scheduling in inference engines
fergusfinn.com·16w
Using caching for fast speculative decoding
fergusfinn.com·16w
Paged attention
fergusfinn.com·16w