Local LLM Deployment

Feeds to Scour
SubscribedAll
Scoured 435 posts in 6.6 ms

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 🪟Awesome windows command-line

Tales of an Ollama Honeypot (Part 3): More Traffic, More Findings

 🖥️Self-hosted apps
posts.inthecyber.com·

iOS 27’s most powerful on-device AI requires iPhone 17 Pro, iPhone Air

 🖥️Self-hosted apps
9to5mac.com·

local llm on laptop 780M GPU using llama + gemma 4 qat

 🖥️Self-hosted apps  Content type: Blog
alper.bearblog.dev·
Less-relevant results

Apple WWDC On-Device AI Deep Dive - Google Docs

 🖥️Self-hosted apps
gist.is··Hacker News

Putting a datacenter GPU in a gaming PC for £200 ($268)

 🖥Home Lab Setup  Content type: Blog
blog.adafruit.com·

Apple's most advanced on-device AI features will only work on select devices

 🖥️Self-hosted apps  Content type: News
gsmarena.com·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

 🖥️Self-hosted apps  Content type: Blog
adambien.blog·

Using Scikit-LLM with Open-Source LLMs

 🖥️Self-hosted apps

Nvidia GeForce RTX 50 Super GPUs may launch in early 2027 with 50% more VRAM

 🗃️SQLite
club386.com·

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 🗃️SQLite  Content type: News  Content type: Blog

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

 🖥️Self-hosted apps  Content type: Blog
dnhkng.github.io·

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

 🖥️Self-hosted apps  Content type: Blog
towardsai.net·

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

 🗃️SQLite
androidauthority.com·

Quality Is Not a Safety Proxy Under Quantization

 🗃️SQLite  Content type: Academic
arxiv.org·

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

 🖥️Self-hosted apps

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

 🗃️SQLite  Content type: Code
github.com··Hacker News

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

 🖥️Self-hosted apps  Content type: Blog
ziraph.com··Hacker News

Qualcomm Announces On-Device AI Claw Ecosystem Plan

 🖥️Self-hosted apps
autonews.gasgoo.com·

NVIDIA's RTX 5060 May Finally Get The VRAM Upgrade Gamers Wanted

 🖥Home Lab Setup  Content type: News
hothardware.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help