⚡ LLM Inference - cyberpsych12 · Scour

The Quantization Error of the Soul: Why Silicon Valley is Inverting the Promethean Fire

✍️Prompt Engineering Blog

·

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

🤖LLMs News

decrypt.co··Hacker News

Infrastructure Options for Scalable AI Inference

📈Performance Engineering Blog

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖LLMs Code

github.com··Hacker News, r/LLM

SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving

✍️Prompt Engineering Academic

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

📈Performance Engineering

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

deemwar-products.github.io··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

t4t.eth.link··Hacker News

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

local-llm.utop.workers.dev··Hacker News

Your robot can’t be smart, fast, and free. Evolution solved that already.

📈Performance Engineering News

thenextweb.com·

Apple WWDC On-Device AI Deep Dive - Google Docs

✍️Prompt Engineering

gist.is··Hacker News

Here's a llama.cpp CLI Command builder.

llamabuilding.com··r/LocalLLaMA

Google's new open-weights model brings image-generation tricks to AI text generation

🤖LLMs News

theregister.com·

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🤖LLMs Blog

ziraph.com··Hacker News

Minimax M3 sm_120

🎮GPU Computing Code

github.com··r/LocalLLaMA

gist:5b74b8c31e934ff50ce57aa653a343d5

✍️Prompt Engineering

gist.github.com··r/LocalLLaMA

Running LLM Inference on Kubernetes: What It Actually Takes

🤖LLMs Blog

fairwinds.com·

DiffusionGemma 26B A4B results on my 5090

huggingface.co··r/LocalLLaMA

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

✍️Prompt Engineering Academic

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

🔭Observability

Sign up or log in to see more results

Log in to enable infinite scrolling