Memory Architecture, Performance, CPU Topology, Cache Locality

Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.net·1d·
Discuss: DEV
🔁Cache Coherence
Flag this post
Enabling Trillion-Parameter Models on AWS EFA
research.perplexity.ai·10h·
Discuss: Hacker News
Hardware Acceleration
Flag this post
Building a highly-available web service without a database
screenshotbot.io·1h·
Discuss: r/programming
🦀Rust
Flag this post
Crushing ML Latency: The (Un)Official Best Practices for Systems Optimisation
pub.towardsai.net·5h
🚀Performance
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.org·1d
Hardware Acceleration
Flag this post
Inside Pinecone: Slab Architecture
pinecone.io·17h·
Discuss: Hacker News
📋Columnar Storage
Flag this post
Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale
daft.ai·17h·
Discuss: Hacker News
🎴TAO
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.com·3d
🏗Computer Architecture
Flag this post
'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'
lmsys.org·1d
📱Edge AI
Flag this post
Low-Level Hacks
blog.raycursive.com·1d·
Discuss: Hacker News
🦀Rust
Flag this post
Radar Trends to Watch: November 2025
oreilly.com·22h
🎭Program Synthesis
Flag this post
Why stop at 1 million tokens when you can have 10? My journey to extreme context on a gaming GPU. [P]
reddit.com·22h·
📱Edge AI
Flag this post
How to build a Heapless Vector using `MaybeUninit<T>` for Better Performance.
dev.to·21h·
Discuss: DEV
⚠️Rust Unsafe
Flag this post
On Designing Low-Latency Systems for High-Traffic Environments
hackernoon.com·1d
⚖️Load Balancing
Flag this post
Geonum – geometric number library for unlimited dimensions with O(1) complexity
github.com·1d·
Discuss: Hacker News
📏Linear Types
Flag this post
Detailed Technical Documentation on AI Implementation Logic (Taking Large Language Models as an Example )
nbtab.com·1d·
Discuss: DEV
📱Edge AI
Flag this post
Algorithmic Complexity Reduction via Quantized State Space Search
dev.to·16h·
Discuss: DEV
⚛️Quantum Computing
Flag this post
Why is AI Generated Rust slow when compared with Go/C#/Node/JavaScript
srid68.github.io·19h·
Discuss: Hacker News
🦀Rust
Flag this post
Disassembling Terabytes of Random Data with Zig and Capstone to Prove a Point
jstrieb.github.io·21m·
Discuss: Hacker News
🔓Binary Exploitation
Flag this post