🤖 programming, security, AI, llms, science, finance - chrislegolife · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

☕Espresso Code

github.com··Hacker News

LangChain Explained: Understanding Models, Prompts, Chains, Memory, Indexes, and Agents

🔷Go, typescript Blog

towardsai.net·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🔷Go, typescript Blog

adambien.blog·

LLM Routing: From Strategy Selection to Production Architecture

🥓Charcuterie Blog

Report: GKE Inference Gateway delivers up to 92% faster AI responses

☕Espresso Blog

cloud.google.com··Hacker News

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

🔷Go, typescript Academic

Philosophy

🔷Go, typescript Reference

docs.langchain.com·

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

🔷Go, typescript

zozo123.github.io··Hacker News

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

☕Espresso News

newsletter.semianalysis.com

··Hacker News

The Inference Alpha: Maximizing Frontier Models on AMD

☕Espresso Blog

digitalocean.com·

a place for friends of OpenJDK

🔷Go, typescript

AI inference: what it is and why it matters for product managers

🔷Go, typescript

marcabraham.com·

Making Local LLM Go Brrr

seanpedersen.github.io·

Infrastructure Options for Scalable AI Inference

☕Espresso Blog

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

☕Espresso Blog

blogs.nvidia.com·

A system programmer’s guide to LLM inference

☕Coffee Roasting Blog

blog.xiangpeng.systems··Hacker News

DiffusionGemma: 4x Faster Text Generation

☕Espresso News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

Using Scikit-LLM with Open-Source LLMs

🔷Go, typescript

machinelearningmastery.com·

DiffusionGemma: The Developer Guide- Google Developers Blog

☕Espresso Blog

developers.googleblog.com··r/LocalLLaMA

Token4Token — pay-per-token inference on Gnosis + Swarm

🔷Go, typescript

t4t.eth.link··Hacker News

Log in to enable infinite scrolling