⚡ Flash Attention - miterion · Scour

RAM-Net: Expressive Linear Attention with Selectively Addressable Memory

arxiv.org·22h

🧩Attention Kernels

Ultrafast visual perception beyond human capabilities enabled by motion analysis using synaptic transistors

nature.com·7h·

Discuss: r/compsci

👁️Attention Optimization

The 4 Flash Attention Variants: How to Train Transformers 10× Longer Without Running Out of Memory

pub.towardsai.net

·5d

👁️Attention Optimization

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

medium.com·3h

👁️Attention Optimization

Learning to Forget Attention: Memory Consolidation for Adaptive Compute Reduction

arxiv.org·22h

👁️Attention Optimization

Index Exchange embeds AI attention signals into SSP for pre-bid targeting

google.com·21h

👁️Attention Optimization

OpenAI GPT-5.3-Codex-Spark Now Running at 1K Tokens Per Secondon BIG Cerebras Chips

servethehome.com·9h·

Discuss: Hacker News

🎯Tensor Cores

Decodability, sensitivity, and criticality measured through single-neuron perturbations

nature.com·5h

👁️Attention Optimization

Create cinematic AI videos from text and images

seedance20.site·18h·

Discuss: Hacker News

🧩Attention Kernels

Contextual Memory Tools

trendhunter.com·12h

🧩Attention Kernels

AI in Multiple GPUs: Point-to-Point and Collective Operations

towardsdatascience.com·14h

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·1d

🎯Tensor Cores

One Task at a Time, Even with AI

wakamoleguy.com·12h·

Discuss: Hacker News

🤖AI Coding Tools

christopherkarani/Wax: 🍯 Memory layer for on-device AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer.

github.com·10h·

Discuss: Hacker News

📊Profiling Tools

Ming-flash-omni-2.0: 100B MoE (6B active) omni-modal model - unified speech/SFX/music generation

huggingface.co·1d·

Discuss: r/LocalLLaMA

Signal vs. Noise at Scale

blog.gorewood.games·9h

🔀Operator Fusion

You are probably overpaying for intelligence

residuals.bearblog.dev·6h

⚡ONNX Runtime

How octorus Renders 300K Lines of Diff at High Speed

dev.to·20h·

Discuss: DEV

🏗️Build Optimization

The 4 Precision Formats: How to Train AI 2× Faster with Half the Memory

pub.towardsai.net

·13h

📉Model Quantization

Turn Text Into Narration Fast With MiniMax Speech-2.8 HD

hackernoon.com·2h

🧩Attention Kernels

Loading more...