Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale
๐Parallel Computing
Flag this post
Don't let these 3 CPU specs trick you into paying more
xda-developers.comยท1d
๐Performance
Flag this post
On Designing Low-Latency Systems for High-Traffic Environments
hackernoon.comยท1d
๐Performance
Flag this post
Inside Pinecone: Slab Architecture
๐Performance
Flag this post
1 billion JSON records, 1-second query response: Apache Doris vs. ClickHouse, Elasticsearch, and PostgreSQL
๐Performance
Flag this post
Moving past speculation: How deterministic CPUs deliver predictable AI performance
venturebeat.comยท2d
๐Parallel Computing
Flag this post
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
arxiv.orgยท15h
๐Performance
Flag this post
Low-Level Hacks
๐งSystems Programming
Flag this post
Parallel achieves 70% accuracy on SEAL, benchmark for hard web research
๐Performance
Flag this post
Show HN: Polyglot standard library HTTP client C/C++/Rust/Python and benchmarks
๐Network Protocols
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.orgยท15h
๐Performance
Flag this post
How Datadog Built a Custom Database to Ingest Billions of Metrics Per Second
blog.bytebytego.comยท4h
๐Performance
Flag this post
Loading...Loading more...