Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
👁️ Attention Optimization
Flash Attention, Memory Efficient, Sparse Attention, Transformers
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
160
posts in
9.2
ms
Benchmarking llama.cpp's brand-new MTP support on Strix Halo
🔧
PTX
calebcoffie.com
·
2d
·
Hacker News
Luce DFlash + PFlash on 7900XTX: Qwen3.6-27B at 2.24x decode and 3.05x prefill vs llama.cpp HIP
⏱️
Benchmarking
lucebox.com
·
3d
·
r/LocalLLaMA
Efficient
Long-Context Modeling in Diffusion
Language
Models via
Block
Approximate Sparse Attention
📊
Gradient Accumulation
arxiv.org
·
1d
Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism
⏱️
CUDA Events
mlsys.wuklab.io
·
2d
·
Hacker News
froggeric/Qwen3.6-27B-MTP-GGUF
📊
Profiling Tools
huggingface.co
·
3d
·
DEV
Starchild-1: The First Real-Time
Multimodal
World Model
🏎️
TensorRT
odyssey.ml
·
2d
·
Hacker News
DeepSeek V4
Flash
: Bringing Frontier AI to the Home
🔍
Nsight
blog.jonathanpage.com
·
2d
·
Hacker News
https://
www.together.ai/blog/coding-agent-benchmarks
⚡
Flash Attention
together.ai
·
6d
If you have the budget, this £2,649 Cyrus 40 ST music streamer is a must-buy
⚡
Flash Attention
whathifi.com
·
1d
KV
Cache
Is Becoming the
Memory
Hierarchy of Inference
🧠
CPU Architecture
touchdown-labs.com
·
3d
NVlabs/LongLive: Infra for Long Video Generation
🏎️
TensorRT
github.com
·
1d
HF downloader
utility
tampermonkey
🏎️
TensorRT
greasyfork.org
·
3d
·
r/LocalLLaMA
QClaw: A Fully Local Agentic Assistant on the Arduino Uno Q
📜
TorchScript
hackster.io
·
1d
Storage for the AI Factory Era A Discussion
⚙️
Systems Programming
servethehome.com
·
6d
How I Shipped an Autonomous Agentic System on a 2026
Serverless-GPU
Stack
🔧
PTX
medium.com
·
2d
The Inference Bottleneck: Architecting Kubernetes Autoscaling for
Production
LLMs
🚀
MLOps
cloudnativenow.com
·
6d
The Developer’s Guide to OpenCode on Google Cloud
🤖
AI Coding Tools
medium.com
·
2d
How Do I Run AI Workloads on Kubernetes Without Wasting GPUs?
🚀
MLOps
fairwinds.com
·
22h
Runtime-Certified Bounded-Error Quantized
Attention
🧩
Attention Kernels
arxiv.org
·
13h
奥赛金牌打包成两步配方
📜
TorchScript
ai-brief.liziran.com
·
4d
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help