Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Local LLMs
🧠 Local LLMs
Specific
local AI, self-hosted LLM, ollama, on-device inference
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
416
posts in
8.5
ms
Improved performance and model support with
GGUF
🟣
Claude
Content type:
Blog
ollama.com
·
5d
5 days ago
Actions for Improved performance and model support with GGUF
Ollama
0.30 GPU Boost: Faster
local
Qwen
inference
on NVIDIA
🔬
Deep Learning
everylocalai.com
·
1h
1 hour ago
·
DEV
Actions for Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
Ollama
0.30 delivers faster NVIDIA GPU performance and wider hardware support
🔷
TensorFlow
alternativeto.net
·
2d
2 days ago
Actions for Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
UniSVQ: 2-bit Unified Scalar-Vector
Quantization
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for UniSVQ: 2-bit Unified Scalar-Vector Quantization
Qwen 3.6 27B AutoRound
GGUF
, need your feedback
🤖
Qwen
huggingface.co
·
1d
1 day ago
·
r/LocalLLaMA
Actions for Qwen 3.6 27B AutoRound GGUF, need your feedback
Neo-X7/Neo-AI
: A fully offline
AI
assistant powered by
Ollama
. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.
🗄️
SQL
Content type:
Code
github.com
·
9h
9 hours ago
·
DEV
Actions for Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.
Show HN: Run
Llama.cpp
In-Process from Java with Project Panama FFM
📝
NLP
deemwar-products.github.io
·
5d
5 days ago
·
Hacker News
Actions for Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM
lightmetal: GPU
LLM
Inference
From a Single Java 25 JAR
🧠
OpenAI
Content type:
Blog
adambien.blog
·
1d
1 day ago
Actions for lightmetal: GPU LLM Inference From a Single Java 25 JAR
I've tested so many desktop
AI
tools, but Hermes with
Ollama
is my new favorite - here's why
🤖
AI Agents
Content type:
News
Content type:
Tutorial
zdnet.com
·
7h
7 hours ago
Actions for I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why
GGUF
vs GPTQ vs AWQ: The Plain-English Guide to
LLM
Quantization
(and Which One to Pick)
📝
NLP
vettedconsumer.com
·
4d
4 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
A system programmer’s guide to
LLM
inference
📝
NLP
Content type:
Blog
blog.xiangpeng.systems
·
2d
2 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
AMD's Lemonade SDK For
Local
AI
Adds NVIDIA CUDA Support
🧠
OpenAI
phoronix.com
·
5h
5 hours ago
Actions for AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
martidu4/honey-ai
: 🍯 All-in-one
AI
honeypot powered by
local
LLMs
. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.
🤖
Qwen
Content type:
Code
github.com
·
6h
6 hours ago
·
Hacker News
Actions for martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.
Unsloth Gemma 4 QAT
📝
NLP
unsloth.ai
·
5d
5 days ago
Actions for Unsloth Gemma 4 QAT
I added this open-source tool to my
local
AI
stack, and my
local
LLM
finally has persistent memory
👨💻
AI Coding
xda-developers.com
·
1d
1 day ago
Actions for I added this open-source tool to my local AI stack, and my local LLM finally has persistent memory
Optimal Post-Training
Quantization
Scales and Where to Find Them
🤖
LLMs
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Optimal Post-Training Quantization Scales and Where to Find Them
On-device
AI
is a margin decision
🧠
OpenAI
Content type:
Blog
ziraph.com
·
3h
3 hours ago
·
Hacker News
Actions for On-device AI is a margin decision
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
📝
NLP
Content type:
News
Content type:
Blog
blog.google
·
5d
5 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
Token4Token — pay-per-token
inference
on Gnosis + Swarm
🧠
OpenAI
t4t.eth.link
·
1d
1 day ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
Fixing a stuck
Ollama
runner and building a GPU watchdog
✍️
Prompt Engineering
patrickmccanna.net
·
2d
2 days ago
·
Hacker News
Actions for Fixing a stuck Ollama runner and building a GPU watchdog
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help