👁️ Attention Mechanisms

Analyzing the geometric dependence of thermoelastic Q -factor in micro hemispherical resonators via a data-augmented CNN-transformer model

🤖Transformers Academic

nature.com·

libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.

🤖Machine Learning Code

github.com·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🤖AI

local-llm.utop.workers.dev··Hacker News

Breaking tunnel vision, imaging AI lifts fluorescence image restoration accuracy and speed

🤖Machine Learning

phys.org·

Built and launched a research-reading and highlighting tool with Claude over a few months. Here are the things AI was surprisingly good (and bad) at.

🤖AI

highlyt.app··r/ClaudeAI

The economics of speculative decoding

🔧LLVM Blog

fergusfinn.com··Hacker News

CRUMB: Efficient Prior Fitted Network Inference via Distributionally Matched Context Batching

🤖Transformers Academic

arxiv.org·

AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence

🤖Transformers

techradar.com

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

⚙️Systems Programming Blog

medium.com

Introducing North Mini Code: Cohere’s First Model For Developers

🤖Machine Learning Blog

huggingface.co··Hacker News

Google open-sources speedy DiffusionGemma text diffusion model

🤖Transformers

siliconangle.com·

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

🤖AI Blog

dnhkng.github.io·

Stateful Swarms: How Persistent Memory Beats Traditional Agent Architectures

🤖Transformers News

artificialintelligencemadesimple.com·

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

🤖AI Code

github.com··Hacker News

Bridging the sim2real gap in the table tennis robot with a transformer-based ball states predictor

🤖Transformers Academic

arxiv.org·

LLM Research Papers: The 2026 List (January to May)

🤖Transformers News

magazine.sebastianraschka.com

··Hacker News

Intelligent inference scheduling with llm-d on Red Hat AI

🔧LLVM

developers.redhat.com·

When AI Agents “Pay Attention”

🤖Transformers

psychologytoday.com·

High-end Hitachi Vantara arrays and Nvidia AI support

Gated DeltaNet, From First Principles

Analyzing the geometric dependence of thermoelastic Q -factor in micro hemispherical resonators via a data-augmented CNN-transformer model

libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

Breaking tunnel vision, imaging AI lifts fluorescence image restoration accuracy and speed

Built and launched a research-reading and highlighting tool with Claude over a few months. Here are the things AI was surprisingly good (and bad) at.

The economics of speculative decoding

CRUMB: Efficient Prior Fitted Network Inference via Distributionally Matched Context Batching

AIs like ChatGPT fall apart in classic 'Stroop' psychological test — and that could stand in the way of achieving artificial general intelligence

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

Introducing North Mini Code: Cohere’s First Model For Developers

Google open-sources speedy DiffusionGemma text diffusion model

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

Stateful Swarms: How Persistent Memory Beats Traditional Agent Architectures

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

Bridging the sim2real gap in the table tennis robot with a transformer-based ball states predictor

LLM Research Papers: The 2026 List (January to May)

Intelligent inference scheduling with llm-d on Red Hat AI

When AI Agents “Pay Attention”