Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🚀 Model Serving
TorchServe, TensorFlow Serving, Inference Optimization, Batching
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
184204
posts in
59.5
ms
Distributed Generative Inference of LLM at Internet
Scales
with
Multi-Dimensional
Communication Optimization
🧮
Vector Databases
arxiv.org
·
6d
Reiner Pope – The math behind how LLMs are
trained
and
served
🧠
Deep Learning
dwarkesh.com
·
1d
Speculative
Decoding vs
MoE
: 3.2x Cost Gap on Llama 3
🔨
LLVM
tildalice.io
·
3d
AmSach/kvquant
: Drop-in KV cache compressor for local LLM inference - Run 70B models on 8GB RAM
🔨
LLVM
github.com
·
6h
·
DEV
From $200 to $30: Five
Layers
of LLM Cost Optimization
🛠️
Feature Engineering
blog.dwornikowski.com
·
6d
·
Hacker News
Prefetching
Weights
in llama.cpp
🔨
LLVM
am17an.bearblog.dev
·
2d
a16z: Large Model Deployment =
Forgetting
—Can “Continual Learning” Break This
Vicious
Cycle?
🛠️
Feature Engineering
techflowpost.com
·
6d
Paper page - Large Language Models Explore by
Latent
Distilling
🤖
Transformers
huggingface.co
·
3h
How to Deploy a Serverless Spam
Classifier
Using
Scikit-Learn
, AWS Lambda, & API Gateway
🤖
Machine Learning
freecodecamp.org
·
13h
AutoSP
: Long-Context LLM Training via Compiler-Based Sequence
Parallelism
🧠
Deep Learning
pytorch.org
·
23h
·
Hacker News
Show HN: I built a 2nd-order
PyTorch
optimizer
for LLMs that runs on 16GB GPUs
🔨
LLVM
news.ycombinator.com
·
1d
·
Hacker News
From local
prototyping
to GPUs in the
GCP
cloud: Creating a satellite image classification system…
🧠
Deep Learning
medium.com
·
15h
Incompressible
Knowledge Probes: Estimating Black-Box LLM Parameter Counts via
Factual
Capacity
🛠️
Feature Engineering
arxiv.org
·
1d
·
Hacker News
My local
agentic
dev
setup
today
🔨
LLVM
willemvandenende.com
·
3h
·
Hacker News
Asynchronously
Filling &
Evicting
Caches
⏱️
Async Programming
dgtlgrove.com
·
16h
DFIR
+ AI: Using Local LLMs with
DFIR
MCP
Servers
🤖
AI
cybertriage.com
·
19m
Dedicated
vs
Serverless
Inference as You Scale
🔄
Concurrency
digitalocean.com
·
1d
PyTorch Lightning project
quarantined
by
PyPI
📦
uv
pypi.org
·
3h
·
Hacker News
Machine Learning Developers: Why Most
ML
Projects
Fail
After the Model Stage
🤖
Machine Learning
artificialintelligence.oodles.io
·
6h
·
DEV
The
Inference
Economy:
Token
Use
🛠️
Feature Engineering
frontierai.substack.com
·
18m
·
Substack
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help