⚡ Performance - sudorandom · Scour

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips 🖥️Systems Programming

supercomputing-system-ai-lab.github.io·2d·Hacker News

vivek1504/serverless-runtime: Firecracker-based serverless runtime with snapshot optimization, achieving ~5K req/sec and ~1ms latency. 🖥️Systems Programming

github.com·11h·DEV

Deep dive: How ClickHouse handles async inserts (and why it matters for high-throughput pipelines) 🗄️Databases

glassflow.dev·6d·r/programming

A systematic approach to benchmarking SQL processing engines on AWS 🗄️Databases

aws.amazon.com·1d

Blazing fast on-device GenAI with LiteRT-LM 🔭Observability

developers.googleblog.com·1d·Hacker News

Go’s Concurrency Model vs. Java Virtual Threads: A Practical Comparison 🖥️Systems Programming

javacodegeeks.com·6d

BSC’s quantum defense works. The trade-off is 40% slower transaction throughput. 🕸️Distributed Systems

·2d

Smoothing Spiky LLM Traffic: Maximize Provisioned Throughput Utilization With a Queuing… 📡gRPC

·1d

New Power, Memory, Interconnect, and Thermal Architectures for AI Infrastructure at Scale 🖥️Systems Programming

eetimes.com·3d

soundcore To Introduce New Flagship Earbuds With ANKER Thus AI Chip On 22 May 🔭Observability

From beta to stable: Announcing the Azure SDK for Rust 🎉🦀 ⚙️Backend Dev

devblogs.microsoft.com·6d

Eliminate LLM Cold starts: Load models up to 6x Faster with Azure Blob Storage and Run:AI Model Streamer 🔭Observability

devblogs.microsoft.com·2d

Spend Your Compute on Correctness 🖥️Systems Programming

juanreyero.com·5d·Hacker News

Java: Rethink Domain Primitives with Valhalla 🖥️Systems Programming

dfa1.github.io·1d·Hacker News

Sort providers by cost, latency, or throughput on AI Gateway 🔀BGP

At Mythos Speed: A Defender's Playbook for the AI Vulnerability Surge in 2026 🔭Observability

recordedfuture.com·2d

Global Marine LNG Terminals, Tankers & Trade: A High-Resolution AIS-Based Dataset of LNG Trade (2020–2024) 📊Data Visualization

STREAM Benchmark Reference Information 🖥️Systems Programming

cs.virginia.edu·6d

E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference 🖥️Systems Programming

Artain-AI/ignite-ms: Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control. 📡gRPC

github.com·22h·Hacker News

Sign up or log in to see more results

Log in to enable infinite scrolling