🪟 Context Windows - bloknayrb · Scour

We Should Take Text Optimization More Seriously

💬LLMs Blog

yoonholee.com··Hacker News

markusheimerl/gpt: A generative pretrained transformer implementation

💬LLMs Code

github.com··Hacker News

Claude Fable 5 Free Through June 22 on Pro, Max, Team, and Enterprise Plans

🎭Anthropic Claude News

claude5.ai··Hacker News

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

huggingface.co··Hacker News, Hacker News, r/LocalLLaMA

Introducing the Third Generation of Apple’s Foundation Models

machinelearning.apple.com··Hacker News, r/apple

DeepSeek Made AI Cheap. Now It Needs Billions to Keep It Cheap.

🤝AI Agents News Blog

chinacompany.substack.com··Substack

mingusb/transformer-golf: The Fully Unrolled Transformer: An experimental repository for architecture simplification and compilation. [2026]

💬LLMs Code

github.com··Hacker News

Claude Fable 5 and Mythos 5 pricing: Anthropic's new $10/$50 top tier

🎭Anthropic Claude

aipricing.guru··Hacker News

See, Act, Correct: three levers for working with a code agent

🤖Agent Architecture Blog

blog.owulveryck.info··Hacker News, Hacker News

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖LLM Code

github.com··Hacker News

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

🤖Agent Architecture Blog

developer.nvidia.com··Hacker News

Do Transformers Need Three Projections? Systematic Study of QKV Variants

📱Edge AI Academic

arxiv.org··Hacker News

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🤖LLM Code

github.com··Hacker News

Magenta RealTime 2: Open and Local Live Music Models

🎤Voice Interfaces

magenta.withgoogle.com··Hacker News, Hacker News, r/LocalLLaMA

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🦙Ollama News Blog

blog.google··Hacker News

ashp15205/guardian-runtime: A zero-latency, local-first runtime firewall for LLMs. Intercept every prompt and response locally to stop data leaks and runaway token costs.

🤖LLM Code

github.com··Hacker News

Maybe Coding Agents Don't Need a Bigger Memory. Maybe They Need Continuity.

🏛️Memory Palaces News Blog

oldskultxo.substack.com··Substack

How LLMs work | Practical Leaders

practical-leaders.com··Hacker News

Bad MCP design cost your Agent 5× more tokens

🔌MCP Discussion

news.ycombinator.com··Hacker News

Replace your CI with a merge queue

💬LLMs Blog

blog.exe.dev··Hacker News

Log in to enable infinite scrolling