Self-hosted AI

Feeds to Scour
SubscribedAll
Scoured 572 posts in 20.1 ms

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 💻Local LLMs

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

 💻Local LLMs
alternativeto.net·

Quality Is Not a Safety Proxy Under Quantization

 💻Local LLMs  Content type: Academic
arxiv.org·

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

 🧠AI  Content type: Code
github.com··DEV

Qwen 3.6 27B AutoRound GGUF, need your feedback

 💻Local LLMs
huggingface.co··r/LocalLLaMA

Improved performance and model support with GGUF

 💻Local LLMs  Content type: Blog
ollama.com·

Using Scikit-LLM with Open-Source LLMs

 🧠AI

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

 💻Local LLMs

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

 🏗️AI Infrastructure  Content type: Code
github.com··Hacker News

Unsloth Gemma 4 QAT

 💻Local LLMs
unsloth.ai·

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 💻Local LLMs  Content type: News  Content type: Blog

local llm on laptop 780M GPU using llama + gemma 4 qat

 💻Local LLMs  Content type: Blog
alper.bearblog.dev·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 💻Local LLMs  Content type: News  Content type: Blog
blog.google··Hacker News

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

 💻Local LLMs  Content type: Code
github.com··Hacker News

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

 💻Local LLMs  Content type: Code
github.com··Hacker News

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

 💻Local LLMs  Content type: Code
github.com·

Ideogram4 GGUF is out!

 💻Local LLMs

ulyssestenn/omt: Ollama Model Test - Figure out the best model for the task

 🧠AI  Content type: Code
github.com··Hacker News

fix(gateway): fail closed for unknown model auth · openclaw/openclaw@85343ea

 🤖AI Inference  Content type: Code
github.com·

feat(parallel): add free Parallel Search MCP as the zero-config defau… · openclaw/openclaw@983b65b

 🧠AI  Content type: Code
github.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help