Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Local LLM Deployment
🏠 Local LLM Deployment
Specific
Model Optimization, GPU Acceleration, Inference, Privacy
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
110
posts in
4.4
ms
GGUF
vs GPTQ vs AWQ: The Plain-English Guide to
LLM
Quantization
(and Which One to Pick)
🖥️
Self-hosted apps
vettedconsumer.com
·
4d
4 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
Ollama
0.30 delivers faster NVIDIA
GPU
performance and wider hardware support
🖥️
Self-hosted apps
alternativeto.net
·
2d
2 days ago
Actions for Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
On-device
AI
is a margin decision
🖥️
Self-hosted apps
Content type:
Blog
ziraph.com
·
5h
5 hours ago
·
Hacker News
Actions for On-device AI is a margin decision
KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4
GPU
(gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for
llama.cpp
, fully measured on real hardware.
🗃️
SQLite
Content type:
Code
github.com
·
7h
7 hours ago
·
Hacker News
Actions for KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.
Fixing a stuck
Ollama
runner and building a
GPU
watchdog
🖥️
Self-hosted apps
patrickmccanna.net
·
2d
2 days ago
·
Hacker News
Actions for Fixing a stuck Ollama runner and building a GPU watchdog
Gemma 4 QAT
models
:
Optimizing
model compression for mobile and laptop efficiency
🖥️
Self-hosted apps
Content type:
News
Content type:
Blog
blog.google
·
5d
5 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
A system programmer’s guide to
LLM
inference
🖥️
Self-hosted apps
Content type:
Blog
blog.xiangpeng.systems
·
2d
2 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
Show HN: Run
Llama.cpp
In-Process from Java with Project Panama FFM
🖥️
Self-hosted apps
deemwar-products.github.io
·
5d
5 days ago
·
Hacker News
Actions for Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM
Less-relevant results
Apple WWDC
On-Device
AI
Deep Dive - Google Docs
🖥️
Self-hosted apps
gist.is
·
1h
1 hour ago
·
Hacker News
Actions for Apple WWDC On-Device AI Deep Dive - Google Docs
NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet
🖥️
Self-hosted apps
huggingface.co
·
2d
2 days ago
·
Hacker News
Actions for NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet
Running Qwen 35B MoE at 450k Context on a Single 32GB
GPU
🪟
Awesome windows command-line
local-llm.utop.workers.dev
·
3d
3 days ago
·
Hacker News
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
Looking Inside Chromium’s
On-Device
AI
Stack
🖥️
Self-hosted apps
Content type:
Blog
island.io
·
7h
7 hours ago
·
Hacker News
Actions for Looking Inside Chromium’s On-Device AI Stack
Integrate
on-device
AI
models
into your app using Core
AI
- WWDC26 - Videos
🖥️
Self-hosted apps
developer.apple.com
·
2d
2 days ago
·
Hacker News
Actions for Integrate on-device AI models into your app using Core AI - WWDC26 - Videos
Run (your largest)
local
models
from your iPhone
🗃️
SQLite
Content type:
Blog
lmstudio.ai
·
6d
6 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for Run (your largest) local models from your iPhone
Gemma 4 12B: A unified, encoder-free multimodal
model
🗃️
SQLite
Content type:
Discussion
news.ycombinator.com
·
3d
3 days ago
·
Hacker News
Actions for Gemma 4 12B: A unified, encoder-free multimodal model
Google’s DiffusionGemma is 4x faster than its other Gemma
models
🗃️
SQLite
thenewstack.io
·
6h
6 hours ago
Actions for Google’s DiffusionGemma is 4x faster than its other Gemma models
local
AI
agents for Cursor with pre-tuned marketplace/commu
🖥️
Self-hosted apps
locaible.com
·
10h
10 hours ago
·
Hacker News
Actions for local AI agents for Cursor with pre-tuned marketplace/commu
Omnifs: APIs and data sources as files you can ls, cat, grep, and pipe
🖥️
Self-hosted apps
omnifs.dev
·
1d
1 day ago
·
Hacker News
Actions for Omnifs: APIs and data sources as files you can ls, cat, grep, and pipe
Apple Silicon's
on-device
AI
bet hasn't moved – only the chip range that runs it
🖥️
Self-hosted apps
tbreak.com
·
5d
5 days ago
·
Hacker News
,
r/apple
Actions for Apple Silicon's on-device AI bet hasn't moved – only the chip range that runs it
Token4Token — pay-per-token
inference
on Gnosis + Swarm
🖥️
Self-hosted apps
t4t.eth.link
·
1d
1 day ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help