Crushing ML Latency: The (Un)Official Best Practices for Systems Optimisation
pub.towardsai.net·5h
🚀Performance
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.org·1d
⚡Hardware Acceleration
Flag this post
Inside Pinecone: Slab Architecture
📋Columnar Storage
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·3d
🏗Computer Architecture
Flag this post
TIL: For long-lived LLM sessions, swapping KV Cache to RAM is ~10x faster than recalculating it. Why isn't this a standard feature?
🔁Cache Coherence
Flag this post
'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'
lmsys.org·1d
📱Edge AI
Flag this post
Low-Level Hacks
🦀Rust
Flag this post
Radar Trends to Watch: November 2025
oreilly.com·22h
🎭Program Synthesis
Flag this post
Why stop at 1 million tokens when you can have 10? My journey to extreme context on a gaming GPU. [P]
📱Edge AI
Flag this post
How to build a Heapless Vector using `MaybeUninit<T>` for Better Performance.
⚠️Rust Unsafe
Flag this post
On Designing Low-Latency Systems for High-Traffic Environments
hackernoon.com·1d
⚖️Load Balancing
Flag this post
Geonum – geometric number library for unlimited dimensions with O(1) complexity
📏Linear Types
Flag this post
Loading...Loading more...