Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
👁️ Attention Optimization
Flash Attention, Memory Efficient, Sparse Attention, Transformers
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
160
posts in
8.4
ms
Cerebras: The $56.4 Billion IPO Challenging NVIDIA’s
Memory
Wall
⚡
Flash Attention
artificialintelligencemadesimple.com
·
2d
A primer on how large
language
model works
🎓
Model Distillation
mayijie.substack.com
·
5d
·
Substack
Sandisk’s AI Pivot Changes The NAND Narrative (NASDAQ:SNDK)
⚡
Flash Attention
seekingalpha.com
·
31m
Attend
Locally, Remember
Linearly
: Linear Attention as Cross-Frame
Memory
for Autoregressive Video Diffusion
⚡
Flash Attention
arxiv.org
·
2d
Ollama on Mac: Setup and
Optimization
Guide (2026)
📊
Profiling Tools
insiderllm.com
·
5d
InferenceBench: A Benchmark for Open-Ended Inference
Optimization
by AI Agents
⚡
ONNX Runtime
inferencebench.ai
·
17h
·
Hacker News
Introducing the Ettin Reranker Family
📉
Model Quantization
huggingface.co
·
2d
·
r/LocalLLaMA
RT by @awnihannun: Subagents running locally and simultaneously on MacBook
Pro
M5 with Codex CLI + @lmstudio to review code and find bugs using Qwen 3.6
🔄
ONNX
twitter.macworks.dev
·
20h
michelangeloromerochisco/ternative: Inference engine for ternary-weight LLMs with runtime LoRA - the llama.cpp of BitNet models
🔄
ONNX
github.com
·
1d
·
Hacker News
Gemini Extended Thinking ✨, ChatGPT finance 📱, Claude Code at
scale
👨💻
🤖
AI Coding Tools
tldr.tech
·
3d
Large-scale
, SRAM-based LLM Inference Deployment (
Groq
)
⚡
ONNX Runtime
semiengineering.com
·
40m
AI runs on tokens. There’s a missing artifact between them.
✂️
CUTLASS
medium.com
·
2d
DALI VEGA Wireless Hi-Fi System Delivers All-in-One Sound With BluOS, HDMI ARC, and Adaptive Orientation
⏱️
Benchmarking
ecoustics.com
·
8h
·
ecoustics.com
SuperInfer: SLO-Aware Rotary Scheduling and
Memory
Management for LLM Inference on Superchips
⏱️
CUDA Events
supercomputing-system-ai-lab.github.io
·
2d
·
Hacker News
The Ultimate LLM Fine-Tuning Guide
⚡
ONNX Runtime
promptinjection.net
·
4d
·
Hacker News
Coding Agent Inference Benchmark Revealed
⚡
ONNX Runtime
startuphub.ai
·
1d
Ollama vs vLLM vs llama.cpp: Which
Wins
for Your Use Case
📊
Profiling Tools
tildalice.io
·
5d
Blazing fast on-device GenAI with LiteRT-LM
🎯
Tensor Cores
developers.googleblog.com
·
1d
·
Hacker News
New comment by easygenes in "Gemini 3.5
Flash
"
🔄
ONNX
news.ycombinator.com
·
1d
·
Hacker News
MegaTrain Full Precision Training of 100B+ Parameter LLMs on a Single
GPU
🏎️
TensorRT
github.com
·
4d
·
Hacker News
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help