Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
π» Local AI
local inference, ollama, llama.cpp, on-premise LLM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
150297
posts in
19.5
ms
Inference
Arena
β new
benchmark
of local inference and training
Β
π
LLM Deployment
kvark.github.io
Β·
4d
Β·
Hacker News
Blink: CPU-Free LLM Inference by
Delegating
the Serving Stack to GPU and
SmartNIC
Β
π
LLM Deployment
arxiv.org
Β·
9h
Article
: Building a
Voice-Controlled
Local AI Agent
Β
π¬
Small LMs
medium.com
Β·
4h
RetroInfer
: A
Vector
Storage Engine for Scalable Long-Context LLM Inference
Β
π
LLM Deployment
vldb.org
Β·
1d
Run
Qwen3.5
on an Old Laptop: A
Lightweight
Local Agentic AI Setup Guide
Β
π
LLM Deployment
kdnuggets.com
Β·
1d
The Engine Behind Modern LLM Inference, Part 1: Continuous
Batching
,
PagedAttention
, and the End ofβ¦
Β
π
LLM Deployment
medium.com
Β·
20h
LLM
inference
engine from
scratch
in C++
Β
π
LLM Deployment
anirudhsathiya.com
Β·
4d
Β·
Hacker News
Best Open Source
Offline
AI Agent
Β
π
Open Source AI
news.ycombinator.com
Β·
22h
Β·
Hacker News
Ollama
is still the
easiest
way to start local LLMs, but it's the worst way to keep running them
Β
π―
LLM Finetuning
xda-developers.com
Β·
1d
Building Local AI Video Generation
Rig
: A Hardware
Breakdown
Β
π
LLM Deployment
hackster.io
Β·
5h
Googleβs new AI app is a
glimpse
of the future
Β
π¬
Small LMs
computerworld.com
Β·
5h
Β·
Hacker News
ASUS
UGen300
USB AI
Accelerator
8GB for local inference
Β
π
LLM Deployment
asus.com
Β·
4h
Β·
r/StableDiffusion
EU's
Exposed
AI Infrastructure
Β
π
LLM Deployment
insecurestack.substack.com
Β·
2d
Β·
Substack
TurboQuant
Explained: Extreme AI
Compression
for Faster, Cheaper LLM Inference and Vector Search
Β
π
LLM Deployment
medium.com
Β·
5d
lunargate-ai/gateway
: High-performance self-hosted AI gateway (OpenAI-compatible) with routing,
retries
, and streaming
Β
π
LLM Deployment
github.com
Β·
2h
Β·
Hacker News
Running
Gemma
4 Locally with
Ollama
on Your PC
Β
π
LLM Deployment
analyticsvidhya.com
Β·
1d
We
burned
$200 learning that local AI
wasn
't about the money
Β
π
LLM Deployment
write.as
Β·
22h
F&S M.2 AI Accelerator Uses
NXP
Ara-240
for Edge Inference Workloads
Β
π
LLM Deployment
linuxgizmos.com
Β·
9h
Decentralized
AI in 50
Lines
of Python
Β
π
Open Source AI
iamtrask.github.io
Β·
3d
Inside LLM Inference: KV Cache,
Prefill
, and the
Decode
Bottleneck
Β
π
LLM Deployment
pub.towardsai.net
Β·
1d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help