🧠 LLM Inference - emschwartz · Scour

Agentic Coding and the Problem of Oracles

epkconsulting.substack.com·3d·

Discuss: r/programming

🛡️AI Security

Databricks adds MemAlign to MLflow to cut cost and latency of LLM evaluation

infoworld.com·5d

🏆LLM Benchmarking

The Passive AI Learning Stack That Changed the Way I Learn

donnfelker.com·2d

📰RSS Reading Practices

Claude: Speed up responses with fast mode

simonwillison.net·3d

🔌Claude Plugins

Running LLMs in-browser via WebGPU, Transformers.js, and Chrome's Prompt API—no Ollama, no server

noaibills.app·3d·

Discuss: r/LocalLLaMA, r/SideProject, r/selfhosted

do you know more modern version of something like byt5-small?

huggingface.co·2d·

Discuss: r/LocalLLaMA

🔤Tokenization

A Proposal for TruesightBench

lesswrong.com·5d

📋Text Quality

Hardware Acceleration

jellyfin.org·3d

⚡Hardware Acceleration

NVIDIA VibeTensor: AI Just Built Its Own Deep Learning Engine… And It Actually Works (AI Revolution

youtube.com·2d

Why “Context Lake” Matters For Agentic AI

forrester.com·2d

🌐Distributed systems

Almost Timely News: 🗞️ How to Do Great Focus Groups with RPGs and AI (2026-02-08)

almosttimely.substack.com·2d·

Discuss: Substack

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL

arxiv.org·1d

Adaptive Retrieval helps Reasoning in LLMs -- but mostly if it's not used

arxiv.org·1d

🔄LLM RAG Pipelines

Making a Hardware Accelerated Live TV Player from Scratch in C: HLS Streaming, MPEG-TS Demuxing, H.264 Parsing, and Vulkan Video Decoding

blog.jaysmito.dev·2d·

Discuss: Hacker News, r/programming

📄File Formats

AI workloads challenge the cattle model

varoa.net·3d·

Discuss: Hacker News

Achieving Ultra-Fast AI Chat Widgets

cjroth.com·3d·

Discuss: Hacker News

💾Prompt Caching

Jokes on You AI: Turning the Tables

dev-log.me·2d·

Discuss: Hacker News

👨‍💻AI Coding

datascienceweekly.substack.com·5d·

Discuss: Substack

🏗️LLM Infrastructure

Show HN: Routed Attention – 75-99% savings by routing between O(N) and O(N²)

zenodo.org·3d·

Discuss: Hacker News

🚀Async Optimization

Writing an LLM from scratch, part 32b -- Interventions: gradient clipping

gilesthomas.com·6d·

Discuss: Hacker News

🏆LLM Benchmarking

Loading more...