👁️ Attention Optimization - miterion · Scour

Understanding and Optimizing Attention-Based Sparse Matching for Diverse Local Features

arxiv.org·3d

🧩Attention Kernels

Prism: Spectral-Aware Block-Sparse Attention

arxiv.org·3d

🧩Attention Kernels

Spot The Difference

seekingalpha.com

·3h

Training-Free Real-Time Control for Autoregressive Video Generation

daydream.live·20h·

Discuss: Hacker News

🏎️TensorRT

Architectural and Mathematical Foundations of Machine Learning: A Rigorous Synthesis of Theory, Geometry, and Implementation

chizkidd.github.io·1d·

Discuss: Hacker News

📉Model Quantization

Larger AI Models Are Not Always Better At Remembering Facts, Research Reveals

quantumzeitgeist.com·18h

🎓Model Distillation

OpenAI introduces GPT‑5.3‑Codex‑Spark, an ultra-fast coding model powered by Cerebras

neowin.net·6h

⚡Flash Attention

A C implementation of the inference pipeline for the Mistral AI’s Voxtral Realtime 4B model

blog.adafruit.com·18h

🏎️TensorRT

Arming the rebels with GPUs: Gradium, Kyutai, and Audio AI

amplifypartners.com·5h·

Discuss: Hacker News

🏎️TensorRT

MiniMaxAI MiniMax-M2.5 has 230b parameters and 10b active parameters

openhands.dev·13h·

Discuss: r/LocalLLaMA

⏱️Benchmarking

How Andrej Karpathy Built a Working Transformer in 243 Lines of Code

analyticsvidhya.com·21h

📜TorchScript

New Generative Paradigm: Drifting Model

mail.bycloud.ai·2d

📊Gradient Accumulation

Transformer-Based Memory Forecasting: Leveraging Anonymized Aggregates for Personal Insights

novice.media·1d·

Discuss: Hacker News

⚡Flash Attention

Space Alignment Matters: The Missing Piece for Inducing Neural Collapse in Long-Tailed Learning

sonomarpa.sonoma.lib.ca.us·11h

📊Gradient Accumulation

Show HN:ProductFront-Streamlined product discovery platform for maximum exposure

productfront.tech·1d·

Discuss: Hacker News

🤖AI Coding Tools

Recursive Language Models: Stop Stuffing the Context Window

nlp.elvissaravia.com·14h

⚡ONNX Runtime

An Ontology of Representations: Limits of Universality

lesswrong.com·13h

Ming-flash-omni-2.0: 100B MoE (6B active) omni-modal model - unified speech/SFX/music generation

huggingface.co·16h·

Discuss: r/LocalLLaMA

⚡Flash Attention

Cuentos: A Large-Scale Eye-Tracking Reading Corpus on Spanish Narrative Texts

nature.com·1d

🧩Attention Kernels

Prompting Best Practices for Instruction-Following Rerankers

zeroentropy.dev·21h·

Discuss: Hacker News

🤖AI Coding Tools

Sign up or log in to see more results