🌐 Distributed LLM Systems - pleto · Scour

DiffusionGemma 26B A4B results on my 5090

🧠Large Language Models (LLMs)

huggingface.co··r/LocalLLaMA

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

📊AI Performance Profiling Academic

[eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF

🚀LLM serving frameworks

isovalent-9197153.hs-sites.com·

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

🧠Large Language Models (LLMs) News

digg.com··Hacker News

DiffusionGemma: 4x Faster Text Generation

🧠Large Language Models (LLMs) News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.

🔧Systems-level optimizations for LLM serving Code

github.com··r/LocalLLaMA

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

🧠Large Language Models (LLMs) News

·

Machinic Psychopharmacology: Do LLMs Self-Medicate?

🔧Systems-level optimizations for LLM serving

lesswrong.com··Hacker News

Does anyone know what PCIe mode was used for these benchmarks?

🚀LLM serving frameworks Code

github.com··r/LocalLLaMA

#070 - Anthropic walks back Fable 5's throttle, Claude Desktop hides a 1.8GB VM, HTML doubles signups

🧠Large Language Models (LLMs)

indiehacker.news·

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

📊AI Performance Profiling Academic

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🔧Systems-level optimizations for LLM serving Code

github.com··Hacker News

A drop-in replacement chat template for google/gemma-4-31B-it tuned for open-source agentic coding harnesses.

🚀LLM serving frameworks

gist.github.com··r/LocalLLaMA

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

🚀LLM serving frameworks Academic

Integrate OpenShift AI and PG Airman MCP Server

🚀LLM serving frameworks

developers.redhat.com·

dagploy/dax: AIOps Infra for deploy and manage self-hosted local AI in your own cloud. Vibe coding and AI agents compatible.

🤖Agents using LLMs Code

github.com··Hacker News

NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

🚀LLM serving frameworks Blog

blogs.nvidia.com··Hacker News

Five labs, five minds: building a multi-model finance drama on small models

✨Model optimizations in LLMs Blog

huggingface.co·

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

🔧Systems-level optimizations for LLM serving Academic

Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms

⚙️AI Infrastructure Automation Blog

Sign up or log in to see more results

Log in to enable infinite scrolling