🔧 Systems-level optimizations for LLM serving - pleto · Scour

High-end Hitachi Vantara arrays and Nvidia AI support

🤖Agents using LLMs News

blocksandfiles.com·

Qwen 3.6 27B AutoRound GGUF, need your feedback

✨Model optimizations in LLMs

huggingface.co··r/LocalLLaMA

High Bandwidth Flash | A New Memory for AI Data Centers and Edge Computing | Sandisk

📊AI Performance Profiling

ncnonline.net·

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

🧠Large Language Models (LLMs)

smolhub.com··r/LocalLLaMA

How to Measure Time To First Token (TTFT) in AI Systems

💬Prompt optimizations for LLM serving

qainsights.com··Hacker News

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

💬Prompt optimizations for LLM serving Academic

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

✨Model optimizations in LLMs News Blog

blog.google··Hacker News

Machinic Psychopharmacology: Do LLMs Self-Medicate?

🚀LLM serving frameworks

lesswrong.com··Hacker News

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

🧠Large Language Models (LLMs) News

decrypt.co··Hacker News

Making Local LLM Fast

🧠Large Language Models (LLMs)

bogdan.nimblex.net··Hacker News

libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.

📊AI Performance Profiling Code

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

✨Model optimizations in LLMs Blog

mimo.xiaomi.com··Hacker News, r/LocalLLaMA

Claude Fable 5 🚀, Gemini 3.5 Live Translate 📱, scaling test time compute 📈

🤖Agents using LLMs

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

🧠Large Language Models (LLMs) Blog

dnhkng.github.io·

Youssof Altoukhi (@Youssofal_)

🧠Large Language Models (LLMs)

xcancel.com··r/LocalLLaMA

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

🚀LLM serving frameworks Academic

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

🚀LLM serving frameworks Code

github.com··Hacker News

LLM Research Papers: The 2026 List (January to May)

🧠Large Language Models (LLMs) News

magazine.sebastianraschka.com

··Hacker News

Rebellions Bets on Memory-Centric Architecture as it Weighs IPO Options

⚡Real-time AI Systems News

iOS Security SDKs & Audits for Production Teams

✨Model optimizations in LLMs Discussion

sentinelden.com··Hacker News

Sign up or log in to see more results

Log in to enable infinite scrolling