⚡ Flash Attention - jhcha.oyo · Scour

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

🤖AI Code

github.com··Hacker News

Less-relevant results

Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels

🤖AI Academic

Efficient and Training-Free Single-Image Diffusion Models

haojunqiu.github.io··Hacker News

Express Language Modeling

⚙️Algorithms Academic

Gated Bidirectional Linear Attention for Generative Retrieval

⚡Transformers Academic

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Code

github.com··Hacker News

How Much Dense Attention is Necessary? Oracle-Guided Sparse Prefill for Full/GQA Layers in Hybrid Long-Context Models

⚡Hardware Acceleration Academic

No more posts from jhcha.oyo's subscribed feeds.

Scour all 25255 feeds Learn more about Feeds

Log in to enable infinite scrolling