Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale
daft.aiยท2hยท
Discuss: Hacker News
๐Ÿ”€Parallel Computing
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.comยท1d
๐Ÿš€Performance
Flag this post
On Designing Low-Latency Systems for High-Traffic Environments
hackernoon.comยท1d
๐Ÿš€Performance
Flag this post
Why is AI Generated Rust slow when compared with Go/C#/Node/JavaScript
srid68.github.ioยท5hยท
Discuss: Hacker News
๐Ÿš€Performance
Flag this post
Inside Pinecone: Slab Architecture
pinecone.ioยท3hยท
Discuss: Hacker News
๐Ÿš€Performance
Flag this post
1 billion JSON records, 1-second query response: Apache Doris vs. ClickHouse, Elasticsearch, and PostgreSQL
dev.toยท41mยท
Discuss: DEV
๐Ÿš€Performance
Flag this post
Attention Is All You Need for KV Cache in Diffusion LLMs
paperium.netยท15hยท
Discuss: DEV
๐Ÿง Memory Management
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.comยท2d
๐Ÿ”€Parallel Computing
Flag this post
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
arxiv.orgยท15h
๐Ÿš€Performance
Flag this post
Low-Level Hacks
blog.raycursive.comยท17hยท
Discuss: Hacker News
๐Ÿ”งSystems Programming
Flag this post
My First Multi-GPU Kernel: Writing All-to-All for AMD MI300X
gau-nernst.github.ioยท1dยท
Discuss: Hacker News
๐Ÿ”€Parallel Computing
Flag this post
Running MiniMax-M2 locally - Existing Hardware Advice
reddit.comยท3hยท
Discuss: r/LocalLLaMA
๐Ÿš€Performance
Flag this post
Parallel achieves 70% accuracy on SEAL, benchmark for hard web research
parallel.aiยท55mยท
Discuss: Hacker News
๐Ÿš€Performance
Flag this post
Free Functions Don't Change Performance (Much)
16bpp.netยท1dยท
Discuss: Hacker News, r/cpp
๐Ÿš€Performance
Flag this post
Show HN: Polyglot standard library HTTP client C/C++/Rust/Python and benchmarks
github.comยท15hยท
Discuss: Hacker News
๐ŸŒNetwork Protocols
Flag this post
Benchmarking the cost of Java's EnumSet - A Second Look
kinnen.deยท42mยท
Discuss: r/programming
๐Ÿš€Performance
Flag this post
Algorithmic Complexity Reduction via Quantized State Space Search
dev.toยท2hยท
Discuss: DEV
๐Ÿš€Performance
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.orgยท15h
๐Ÿš€Performance
Flag this post
Lazy loading isn't the magic pill to fix AI Inference
tensorfuse-docs.mintlify.devยท5hยท
Discuss: Hacker News
๐Ÿš€Performance
Flag this post
How Datadog Built a Custom Database to Ingest Billions of Metrics Per Second
blog.bytebytego.comยท4h
๐Ÿš€Performance
Flag this post