Memory Architecture, Performance, CPU Topology, Cache Locality

Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.net·1d·
Discuss: DEV
🔁Cache Coherence
Flag this post
A C example with objects and a arena for allocations, what do you think?
reddit.com·4h·
🦀Rust
Flag this post
Enabling Trillion-Parameter Models on AWS EFA
research.perplexity.ai·16h·
Discuss: Hacker News
Hardware Acceleration
Flag this post
Building a highly-available web service without a database
screenshotbot.io·7h·
Discuss: r/programming
🦀Rust
Flag this post
Crushing ML Latency: The (Un)Official Best Practices for Systems Optimisation
pub.towardsai.net·10h
🚀Performance
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.org·1d
Hardware Acceleration
Flag this post
The state of SIMD in Rust in 2025
shnatsel.medium.com·59m·
Discuss: r/rust
🔀SIMD Programming
Flag this post
Inside Pinecone: Slab Architecture
pinecone.io·23h·
Discuss: Hacker News
📋Columnar Storage
Flag this post
Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale
daft.ai·22h·
Discuss: Hacker News
🎴TAO
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·3d
🏗Computer Architecture
Flag this post
'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'
lmsys.org·1d
📱Edge AI
Flag this post
Low-Level Hacks
blog.raycursive.com·1d·
Discuss: Hacker News
🦀Rust
Flag this post
Radar Trends to Watch: November 2025
oreilly.com·1d
🎭Program Synthesis
Flag this post
Why stop at 1 million tokens when you can have 10? My journey to extreme context on a gaming GPU. [P]
reddit.com·1d·
📱Edge AI
Flag this post
An introduction to program synthesis (Part II) - Automatically generating features for machine learning
mchav.github.io·5h·
Discuss: r/programming
🎭Program Synthesis
Flag this post
On Designing Low-Latency Systems for High-Traffic Environments
hackernoon.com·2d
⚖️Load Balancing
Flag this post
Geonum – geometric number library for unlimited dimensions with O(1) complexity
github.com·2d·
Discuss: Hacker News
📏Linear Types
Flag this post
How to build a Heapless Vector using `MaybeUninit<T>` for Better Performance.
dev.to·1d·
Discuss: DEV
⚠️Rust Unsafe
Flag this post
Detailed Technical Documentation on AI Implementation Logic (Taking Large Language Models as an Example )
nbtab.com·1d·
Discuss: DEV
📱Edge AI
Flag this post