🤖 LLM - komodo · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

📈Productivity Code

github.com··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

🔍Search system

t4t.eth.link··Hacker News

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

uccl-project.github.io··Hacker News

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

📈Productivity News

newsletter.semianalysis.com

··Hacker News

DiffusionGemma: 4x Faster Text Generation

📈Productivity News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

How LLMs work | Practical Leaders

💡Recommender system

practical-leaders.com··Hacker News

Why LLMs (still) lack taste

💡Recommender system

beyondtheprior.com··Hacker News

Fixing a stuck Ollama runner and building a GPU watchdog

🔍Search system

patrickmccanna.net··Hacker News

Breaking the Ice: Analyzing Cold Start Latency in vLLM

🔍Search system Academic

arxiv.org··Hacker News

Less-relevant results

Machinic Psychopharmacology: Do LLMs Self-Medicate?

📈Productivity

lesswrong.com··Hacker News

How we fight GPU scarcity without compromise

📈Productivity Blog

equixly.com··Hacker News

Tokenminning: Because Tokenmaxxing Is a Bad Idea

🗄️Knowledge Base Systems

tokenminning.com··Hacker News

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🔍Search system

huggingface.co··Hacker News

On-device AI is a margin decision

💡Recommender system Blog

ziraph.com··Hacker News

Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

🔍Search system

venturebeat.com··Hacker News

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

deemwar-products.github.io··Hacker News

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

vettedconsumer.com··Hacker News

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

📈Productivity News

decrypt.co··Hacker News

LLM AI Chatbots are letting me down every single day

🔍Search system

umrashrf.github.io··Hacker News

Log in to enable infinite scrolling