Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Local LLM Deployment
🏠 Local LLM Deployment
Specific
Model Optimization, GPU Acceleration, Inference, Privacy
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
435
posts in
6.6
ms
Running Qwen 35B MoE at 450k Context on a Single 32GB
GPU
🪟
Awesome windows command-line
local-llm.utop.workers.dev
·
3d
3 days ago
·
Hacker News
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
Tales of an
Ollama
Honeypot (Part 3): More Traffic, More Findings
🖥️
Self-hosted apps
posts.inthecyber.com
·
2d
2 days ago
Actions for Tales of an Ollama Honeypot (Part 3): More Traffic, More Findings
iOS 27’s most powerful
on-device
AI
requires iPhone 17 Pro, iPhone Air
🖥️
Self-hosted apps
9to5mac.com
·
2d
2 days ago
Actions for iOS 27’s most powerful on-device AI requires iPhone 17 Pro, iPhone Air
local
llm
on laptop 780M
GPU
using llama + gemma 4 qat
🖥️
Self-hosted apps
Content type:
Blog
alper.bearblog.dev
·
4d
4 days ago
Actions for local llm on laptop 780M GPU using llama + gemma 4 qat
Less-relevant results
Apple WWDC
On-Device
AI
Deep Dive - Google Docs
🖥️
Self-hosted apps
gist.is
·
4h
4 hours ago
·
Hacker News
Actions for Apple WWDC On-Device AI Deep Dive - Google Docs
Putting a datacenter
GPU
in a gaming PC for £200 ($268)
🖥
Home Lab Setup
Content type:
Blog
blog.adafruit.com
·
7h
7 hours ago
Actions for Putting a datacenter GPU in a gaming PC for £200 ($268)
Apple's most advanced
on-device
AI
features will only work on select devices
🖥️
Self-hosted apps
Content type:
News
gsmarena.com
·
1d
1 day ago
Actions for Apple's most advanced on-device AI features will only work on select devices
147th airhacks tv:
Local
LLMs, LightMetal, ZSmith Agents,
AI
Rails, Saving Tokens
🖥️
Self-hosted apps
Content type:
Blog
adambien.blog
·
23h
23 hours ago
Actions for 147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens
Using
Scikit-LLM
with Open-Source LLMs
🖥️
Self-hosted apps
machinelearningmastery.com
·
6d
6 days ago
Actions for Using Scikit-LLM with Open-Source LLMs
Nvidia GeForce RTX 50 Super GPUs may launch in early 2027 with 50% more
VRAM
🗃️
SQLite
club386.com
·
1d
1 day ago
Actions for Nvidia GeForce RTX 50 Super GPUs may launch in early 2027 with 50% more VRAM
MoQ
GGUFs
and GSQ: Low-Bit
GGUFs
Are About to Get Much Better
🗃️
SQLite
Content type:
News
Content type:
Blog
kaitchup.substack.com
·
5d
5 days ago
·
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
🖥️
Self-hosted apps
Content type:
Blog
dnhkng.github.io
·
2d
2 days ago
Actions for Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent
Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
🖥️
Self-hosted apps
Content type:
Blog
towardsai.net
·
2d
2 days ago
Actions for Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
The latest Gemma 4
models
use a training trick to slash their
on-device
memory footprint
🗃️
SQLite
androidauthority.com
·
5d
5 days ago
Actions for The latest Gemma 4 models use a training trick to slash their on-device memory footprint
Quality Is Not a Safety Proxy Under
Quantization
🗃️
SQLite
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Quality Is Not a Safety Proxy Under Quantization
NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet
🖥️
Self-hosted apps
huggingface.co
·
2d
2 days ago
·
Hacker News
Actions for NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet
KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4
GPU
(gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for
llama.cpp
, fully measured on real hardware.
🗃️
SQLite
Content type:
Code
github.com
·
10h
10 hours ago
·
Hacker News
Actions for KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.
Apples to Apples: MLX vs.
Llama.cpp
for Gemma 4 12B on an M1 16GB
🖥️
Self-hosted apps
Content type:
Blog
ziraph.com
·
5d
5 days ago
·
Hacker News
Actions for Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB
Qualcomm Announces
On-Device
AI
Claw Ecosystem Plan
🖥️
Self-hosted apps
autonews.gasgoo.com
·
2d
2 days ago
Actions for Qualcomm Announces On-Device AI Claw Ecosystem Plan
NVIDIA's RTX 5060 May Finally Get The
VRAM
Upgrade Gamers Wanted
🖥
Home Lab Setup
Content type:
News
hothardware.com
·
5d
5 days ago
Actions for NVIDIA's RTX 5060 May Finally Get The VRAM Upgrade Gamers Wanted
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help