Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🧠 LLM Inference
Quantization, Attention Mechanisms, Batch Processing, KV Caching
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
26929
posts in
1.54
s
Agentic
Coding and the Problem of
Oracles
epkconsulting.substack.com
·
3d
·
Discuss:
r/programming
🛡️
AI Security
Databricks adds
MemAlign
to
MLflow
to cut cost and latency of LLM evaluation
infoworld.com
·
5d
🏆
LLM Benchmarking
The
Passive
AI Learning
Stack
That Changed the Way I Learn
donnfelker.com
·
2d
📰
RSS Reading Practices
Claude: Speed up
responses
with fast mode
simonwillison.net
·
3d
🔌
Claude Plugins
Running LLMs in-browser via
WebGPU
, Transformers.js, and Chrome's Prompt API—no
Ollama
, no server
noaibills.app
·
3d
·
Discuss:
r/LocalLLaMA
,
r/SideProject
,
r/selfhosted
🦙
Ollama
do you know more modern version of something like
byt5-small
?
huggingface.co
·
2d
·
Discuss:
r/LocalLLaMA
🔤
Tokenization
A
Proposal
for
TruesightBench
lesswrong.com
·
5d
📋
Text Quality
Hardware
Acceleration
jellyfin.org
·
3d
⚡
Hardware Acceleration
NVIDIA
VibeTensor
: AI Just Built Its Own Deep Learning Engine… And It Actually Works (AI
Revolution
youtube.com
·
2d
🖥
GPUs
Why “Context Lake”
Matters
For
Agentic
AI
forrester.com
·
2d
🌐
Distributed systems
Almost
Timely
News: 🗞️ How to Do Great Focus Groups with
RPGs
and AI (2026-02-08)
almosttimely.substack.com
·
2d
·
Discuss:
Substack
🆕
New AI
The Optimal Token
Baseline
:
Variance
Reduction for Long-Horizon LLM-RL
arxiv.org
·
1d
💰
Tokenomics
Adaptive
Retrieval
helps Reasoning in LLMs -- but
mostly
if it's not used
arxiv.org
·
1d
🔄
LLM RAG Pipelines
Making a Hardware Accelerated Live TV Player from Scratch in C: HLS Streaming,
MPEG-TS
Demuxing
, H.264 Parsing, and Vulkan Video Decoding
blog.jaysmito.dev
·
2d
·
Discuss:
Hacker News
,
r/programming
📄
File Formats
AI
workloads
challenge the
cattle
model
varoa.net
·
3d
·
Discuss:
Hacker News
🖥
GPUs
Achieving
Ultra-Fast AI Chat
Widgets
cjroth.com
·
3d
·
Discuss:
Hacker News
💾
Prompt Caching
Jokes
on You AI: Turning the
Tables
dev-log.me
·
2d
·
Discuss:
Hacker News
👨💻
AI Coding
Issue 637
datascienceweekly.substack.com
·
5d
·
Discuss:
Substack
🏗️
LLM Infrastructure
Show HN:
Routed
Attention – 75-99% savings by routing between O(N) and O(
N²
)
zenodo.org
·
3d
·
Discuss:
Hacker News
🚀
Async Optimization
Writing an LLM from scratch, part
32b
-- Interventions: gradient
clipping
gilesthomas.com
·
6d
·
Discuss:
Hacker News
🏆
LLM Benchmarking
Loading...
Loading more...
« Page 16
•
Page 18 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help