Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🚀 Model Serving
TorchServe, TensorFlow Serving, Inference Optimization, Batching
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
1632
posts in
23.1
ms
shreyansh26/Speculative-Decoding
: Speculative Decoding Implementations: EAGLE-3, Medusa-1,
PARD
, Draft Models, N-gram and Suffix Decoding from scratch
🔨
LLVM
github.com
·
3d
·
r/LLM
,
r/LocalLLaMA
Load
balancer
for
vLLM
server instances?
🔨
LLVM
docs.vllm.ai
·
2d
·
r/LocalLLaMA
mistralai/Mistral-Medium-3.5-128B
📦
uv
huggingface.co
·
23h
·
r/LocalLLaMA
,
r/singularity
I Built a
WebAssembly
Runtime
in 5 Days
🐹
Go
tingouw.com
·
1h
·
Hacker News
Prefetching
Weights
in llama.cpp
🔨
LLVM
am17an.bearblog.dev
·
2d
Maybe
I was too
harsh
on deep learning theory (three days ago)
🤖
Machine Learning
lesswrong.com
·
7h
Granite
4.1: IBM's
8B
Model Is Competing With Models Four Times Its Size
🛠️
Feature Engineering
firethering.com
·
4h
·
Hacker News
Lambda
Calculus
Benchmark for AI
🔄
Concurrency
victortaelin.github.io
·
5d
·
Hacker News
Vibe
Training - Auto Train a Small Language Model for Your Use Case
🤖
Transformers
diamantai.substack.com
·
2d
·
Substack
,
r/LocalLLaMA
Lessons from Building an
OTel
Normalizer
for GenAI (Part 1)
🛠️
Feature Engineering
groundcover.com
·
10h
·
Hacker News
Scaling Pain of Coding Agent Serving: Lessons from
Debugging
GLM-5
at Scale
🐍
Programming
z.ai
·
13h
·
Lobsters
LingBot-Map
: Streaming 3D reconstruction with
geometric
context transformer
📓
Jupyter Notebooks
technology.robbyant.com
·
2d
·
Hacker News
Vibin
’ With
Erlang
🐹
Go
blog.whenhen.com
·
6d
·
Lobsters
Introducing the
IBM
Granite
4.1 family of models
🛠️
Feature Engineering
research.ibm.com
·
23h
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
Changes, New Features, and
Fixes
🔨
LLVM
gcc.gnu.org
·
3h
·
Hacker News
,
r/cpp
Qwen 3.6-35B-A3B KV cache bench:
f16
vs q8_0 vs
turbo3
vs turbo4 from 0 to 1M context on M5 Max
🔨
LLVM
llmkube.com
·
2d
·
r/LocalLLaMA
Remote agents in Vibe. Powered by
Mistral
Medium
3.5.
🗂️
Personal CRM
mistral.ai
·
23h
·
Hacker News
,
r/LocalLLaMA
vLLM-Lens
: Fast Interpretability
Tooling
That Scales to Trillion-Parameter Models
🔨
LLVM
lesswrong.com
·
6d
Building a
Threadiverse
Community Platform
⏱️
Async Programming
fedify.dev
·
2d
·
Lobsters
,
Hacker News
Clojure
us the future of AI coding, but you won't use it
🛠️
Feature Engineering
latypoff.com
·
19h
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help