Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Attention Optimization
👁️ Attention Optimization
Flash Attention, Memory Efficient, Sparse Attention, Transformers
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
25
posts in
6.9
ms
Vortex:
Efficient
and Programmable
Sparse
Attention
Serving for AI Agents
🧩
Attention Kernels
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents
KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe
Flash-Attention
for llama.cpp, fully measured on real hardware.
🔥
PyTorch
Content type:
Code
github.com
·
13h
13 hours ago
·
Hacker News
Actions for KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.
Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations
📊
Profiling Tools
Content type:
Blog
andlukyane.com
·
1d
1 day ago
·
Hacker News
Actions for Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations
The
Memory
Problem is Solved: How Google’s
Memory
Caching Makes RNNs Smart Again
⚡
Flash Attention
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for The Memory Problem is Solved: How Google’s Memory Caching Makes RNNs Smart Again
Hugging Face
Transformers
RCE
flaw
enables stealthy compromise via AI model configs
⚡
Flash Attention
csoonline.com
·
6d
6 days ago
Actions for Hugging Face Transformers RCE flaw enables stealthy compromise via AI model configs
ELI5 is a terrible learning prompt, here's the structural reason it fails and a 4-level replacement that actually sticks
🧩
Attention Kernels
Content type:
Blog
Content type:
Tutorial
appliedaihub.org
·
1d
1 day ago
·
r/PromptEngineering
Actions for ELI5 is a terrible learning prompt, here's the structural reason it fails and a 4-level replacement that actually sticks
DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30
🔄
ONNX
newsletter.artofsaience.com
·
6d
6 days ago
Actions for DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30
Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design
⚙️
Systems Programming
Content type:
Blog
tilert.ai
·
2d
2 days ago
·
Hacker News
Actions for Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design
linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D models with VQAScore
🚀
MLOps
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for linzhiqiu/t2v_metrics: Evaluating text-to-image/video/3D models with VQAScore
Efficient
and Training-Free Single-Image Diffusion Models
🧩
Attention Kernels
haojunqiu.github.io
·
5d
5 days ago
·
Hacker News
Actions for Efficient and Training-Free Single-Image Diffusion Models
Deep Learning Weekly: Issue 458
⚡
ONNX Runtime
deeplearningweekly.com
·
6d
6 days ago
Actions for Deep Learning Weekly: Issue 458
FlashMemory-DeepSeek-V4
: Lightning Index Ultra-Long Context via Lookahead
Sparse
Attention
⚡
Flash Attention
Content type:
Academic
arxiv.org
·
2d
2 days ago
·
Hacker News
Actions for FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention
LLM Research Papers: The 2026 List (January to May)
📉
Model Quantization
Content type:
News
magazine.sebastianraschka.com
·
4d
4 days ago
·
Hacker News
Actions for LLM Research Papers: The 2026 List (January to May)
libertywing/FlashMemory-Deepseek-V4
:
FlashMemory
DS-V4 Retriever: a lightweight retriever that
sparsifies
DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.
⚡
Flash Attention
Content type:
Code
github.com
·
1d
1 day ago
Actions for libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.
Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
🛠
Ml-eng
local-llm.utop.workers.dev
·
3d
3 days ago
·
Hacker News
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
TilelliLab/atome-lm: A ternary, zero-heap tiny language model that runs inside a $2 microcontroller — bit-exact Python <-> C99 <-> Cortex-M3 (QEMU) parity. Apache-2.0.
🐍
Python
Content type:
Code
github.com
·
2d
2 days ago
·
r/LLM
Actions for TilelliLab/atome-lm: A ternary, zero-heap tiny language model that runs inside a $2 microcontroller — bit-exact Python <-> C99 <-> Cortex-M3 (QEMU) parity. Apache-2.0.
You Only Index Once: Cross-Layer
Sparse
Attention
with Shared Routing
🧩
Attention Kernels
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for You Only Index Once: Cross-Layer Sparse Attention with Shared Routing
stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp
🔄
ONNX
Content type:
Code
github.com
·
3d
3 days ago
·
r/StableDiffusion
Actions for stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp
From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs
🎓
Model Distillation
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs
Sparrow:
Sparse
Rollout for Stable and
Efficient
Long-context RL of Large Language Models
📊
Gradient Accumulation
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Sparrow: Sparse Rollout for Stable and Efficient Long-context RL of Large Language Models
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help