🧠 LLM Inference - aaaaa · Scour

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🧠LLMs Blog

adambien.blog·

local llm on laptop 780M GPU using llama + gemma 4 qat

🧠LLMs Blog

alper.bearblog.dev·

Less-relevant results

fix(memory-core): filter stale recall entries in REM harness preview · openclaw/openclaw@92418fc

🧠LLMs Code

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

🧠LLMs Academic

DeskDash - a free Windows tool to easily manage your GGUF files

gerry7.itch.io··r/LocalLLaMA

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

🧠LLMs Blog

·

LLM Inference Engineering Room — Part 3: The Orchestration Layer

🧠LLMs Blog

vimal-dwarampudi.medium.com·

Token4Token — pay-per-token inference on Gnosis + Swarm

t4t.eth.link··Hacker News

Speculators v0.5.0: DFlash support and online training

developers.redhat.com·

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🧠LLMs Blog

ziraph.com··Hacker News

andreyvgavrilov/food_database: AI agent to evaluate recipe nutrition

🧠LLMs Code

github.com··r/mcp

RakuOS fixes the one thing that annoys me most about immutable Linux distros

🧠LLMs News

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

🧠LLMs Blog

towardsai.net·

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

🤖AI Engineering

huggingface.co··r/LocalLLaMA

How to Run Gemma 4 12B Locally - The Best AI For Consumer Laptops

🧠LLMs Video

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

local-llm.utop.workers.dev··Hacker News

"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY

🧠LLMs News Blog

braddelong.substack.com··Substack

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🧠LLMs Code

github.com··Hacker News

Breaking the Ice: Analyzing Cold Start Latency in vLLM

🤖AI Engineering Academic

Omnifs: APIs and data sources as files you can ls, cat, grep, and pipe

omnifs.dev··Hacker News

Log in to enable infinite scrolling