Distributed LLM Systems

Feeds to Scour
SubscribedAll
Scoured 86 posts in 5.6 ms

DiffusionGemma 26B A4B results on my 5090

 🧠Large Language Models (LLMs)

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

 📊AI Performance Profiling  Content type: Academic
arxiv.org·

[eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF

 🚀LLM serving frameworks

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

 🧠Large Language Models (LLMs)  Content type: News
digg.com··Hacker News

DiffusionGemma: 4x Faster Text Generation

 🧠Large Language Models (LLMs)  Content type: News  Content type: Blog

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

 🔧Systems-level optimizations for LLM serving  Content type: Code
github.com··r/LocalLLaMA

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

 🧠Large Language Models (LLMs)  Content type: News
latent.space
·

Does anyone know what PCIe mode was used for these benchmarks?

 🚀LLM serving frameworks  Content type: Code
github.com··r/LocalLLaMA

#070 - Anthropic walks back Fable 5's throttle, Claude Desktop hides a 1.8GB VM, HTML doubles signups

 🧠Large Language Models (LLMs)
indiehacker.news·

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

 📊AI Performance Profiling  Content type: Academic
arxiv.org·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 🔧Systems-level optimizations for LLM serving  Content type: Code
github.com··Hacker News

A drop-in replacement chat template for google/gemma-4-31B-it tuned for open-source agentic coding harnesses.

 🚀LLM serving frameworks

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

 🚀LLM serving frameworks  Content type: Academic
arxiv.org·

Integrate OpenShift AI and PG Airman MCP Server

 🚀LLM serving frameworks
developers.redhat.com·

dagploy/dax: AIOps Infra for deploy and manage self-hosted local AI in your own cloud. Vibe coding and AI agents compatible.

 🤖Agents using LLMs  Content type: Code
github.com··Hacker News

NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

 🚀LLM serving frameworks  Content type: Blog

Five labs, five minds: building a multi-model finance drama on small models

 Model optimizations in LLMs  Content type: Blog
huggingface.co·

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

 🔧Systems-level optimizations for LLM serving  Content type: Academic
arxiv.org·

Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms

 ⚙️AI Infrastructure Automation  Content type: Blog
cncf.io·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help