💬 Prompt optimizations for LLM serving - pleto · Scour

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

⚙️AI Infrastructure Automation

uccl-project.github.io··Hacker News

What Arm-based innovations happened in May 2026?

🔧Systems-level optimizations for LLM serving Blog

newsroom.arm.com·

SIFT: Selective-Index For Fast Compute of RAG Prefill by Exploiting Attention Invariance

🔍Retrieval-augmented generation Academic

MLPerf and the rise of latency-aware LLM benchmarking

🧠Large Language Models (LLMs)

For whom the door-bell tolls

🧠Large Language Models (LLMs)

Claude Fable 5 and Mythos 5 pricing: Anthropic's new $10/$50 top tier

🧠Large Language Models (LLMs)

aipricing.guru··Hacker News

"North Mini Code"; open weights, 30B param, Canadian coding model

🤖Agents using LLMs Blog

cohere.com··Hacker News, Hacker News

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

🔍Retrieval-augmented generation

Comparing Claude Fable 5's system prompt to Opus 4.8

🧠Large Language Models (LLMs) Blog

twelvetables.blog··Hacker News

Semantic Cache Distillation: Efficient State Transfer via Reuse and Selective Patching

🔧Systems-level optimizations for LLM serving Academic

Claude Mythos 5 / Fable 5

📊AI Performance Profiling Discussion

anthropic.com··Hacker News

Architecting the Control Plane for Intelligence: System Design of an Enterprise AI Gateway

🤖Agents using LLMs Blog

Build a local voice agent with Red Hat OpenShift AI

🧠Large Language Models (LLMs)

developers.redhat.com·

Why It’s So Hard for Older B2B Leaders to Compete in AI: Your Customers Can Do A Lot in Claude for $20-$200/Month. And You’re Paying $1.00 Per API Call For the Good Stuff.

⚙️AI Infrastructure Automation

The all-you-can-eat AI era is over. It's time to count calories.

🤖Agents using LLMs News

businessinsider.com

·

What Should a Skill Remember? Quality-Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents

🧠Large Language Models (LLMs) Academic

How we fight GPU scarcity without compromise

🧠Large Language Models (LLMs) Blog

equixly.com··Hacker News

Issue #390 - The ML Engineer 🤖

✨Model optimizations in LLMs News Blog

machinelearning.substack.com··Substack

TjWheeler/deep-memory: A GraphRAG implementation with a Vocabulary system to optimise AI integration

🤖Agents using LLMs Code

github.com··Hacker News

Fairness-Aware and Latency-Controllable Scheduling for Chunked-Prefill LLM Serving

🔧Systems-level optimizations for LLM serving Academic

Log in to enable infinite scrolling