🪟 Context Windows - SeanNg

🤖Transformers Academic

arxiv.org·

High Bandwidth Flash | A New Memory for AI Data Centers and Edge Computing | Sandisk

🤖LLM

ncnonline.net·

Issue #390 - The ML Engineer 🤖

🤖AI News Blog

machinelearning.substack.com··Substack

OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software

👁️Computer Vision News

cnx-software.com·

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

⚡Inference Optimization

sleepingrobots.com·

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

🤖LLM Academic

arxiv.org·

How LLMs Actually Work: A Friendly Map for Humans • oreoro

🤖LLM

oreoro.github.io··Hacker News

Benchmarking dots.tts on Strix Halo

🤖AI

sleepingrobots.com·

Gated DeltaNet, From First Principles

✍️Prompt Engineering Blog

sankalp.bearblog.dev·

How to cut the cost of long AI agent threads (without making the agent dumber)

🤖Agent Blog

viktor.com··Hacker News

#065 - Claude writes 80% of Anthropic's own code, Cloudflare buys Vite, ChatGPT ships Dreaming memory

🔓Open Source

indiehacker.news·

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

🤖Agent

latent.space··Hacker News

Anatomy of a high-performance EP kernel

🤖LLM Blog

fergusfinn.com··Hacker News

JeevanJoshi2061/titan_engine_core: Constant-memory sequence modeling engine combining selective holographic-compression (ASH-C) with a coordinate pointer network (HEP-DNA). Bypasses the linear KV Cache bottleneck on consumer GPUs.

🤖LLM Code

github.com··Hacker News

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

🤖LLM Academic

arxiv.org··Hacker News

How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies

🎮Reinforcement Learning Blog

blogs.nvidia.com·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🤖AI News

newsletter.semianalysis.com

··Hacker News

The iPhone’s Last Stand

🤖Agent

stratechery.com··Hacker News

Still: Amortized KV Cache Compaction in a Single Forward Pass

⚡Inference Optimization Academic

arxiv.org·

Efficient and Training-Free Single-Image Diffusion Models

End-to-End Context Compression at Scale

High Bandwidth Flash | A New Memory for AI Data Centers and Edge Computing | Sandisk

Issue #390 - The ML Engineer 🤖

OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

How LLMs Actually Work: A Friendly Map for Humans • oreoro

Benchmarking dots.tts on Strix Halo

Gated DeltaNet, From First Principles

How to cut the cost of long AI agent threads (without making the agent dumber)

#065 - Claude writes 80% of Anthropic's own code, Cloudflare buys Vite, ChatGPT ships Dreaming memory

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

Anatomy of a high-performance EP kernel

JeevanJoshi2061/titan_engine_core: Constant-memory sequence modeling engine combining selective holographic-compression (ASH-C) with a coordinate pointer network (HEP-DNA). Bypasses the linear KV Cache bottleneck on consumer GPUs.

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

The iPhone’s Last Stand

Still: Amortized KV Cache Compaction in a Single Forward Pass