🏠 Self-hosted AI - nmarshall

💻Local LLMs Academic

arxiv.org·

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

🧠AI Code

github.com··DEV

Qwen 3.6 27B AutoRound GGUF, need your feedback

💻Local LLMs

huggingface.co··r/LocalLLaMA

Improved performance and model support with GGUF

💻Local LLMs Blog

ollama.com·

Using Scikit-LLM with Open-Source LLMs

🧠AI

machinelearningmastery.com·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

💻Local LLMs

deemwar-products.github.io··Hacker News

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

🏗️AI Infrastructure Code

github.com··Hacker News

Unsloth Gemma 4 QAT

💻Local LLMs

unsloth.ai·

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

💻Local LLMs News Blog

kaitchup.substack.com··r/LocalLLaMA

local llm on laptop 780M GPU using llama + gemma 4 qat

💻Local LLMs Blog

alper.bearblog.dev·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

💻Local LLMs News Blog

blog.google··Hacker News

zhongkaifu/TensorSharp: A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. It supports Windows/MacOS/Linux with full GPU capability

💻Local LLMs Code

github.com··Hacker News

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

💻Local LLMs Code

github.com··Hacker News

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

💻Local LLMs Code

github.com·

Ideogram4 GGUF is out!

💻Local LLMs

huggingface.co··r/StableDiffusion

ulyssestenn/omt: Ollama Model Test - Figure out the best model for the task

🧠AI Code

github.com··Hacker News

fix(gateway): fail closed for unknown model auth · openclaw/openclaw@85343ea

🤖AI Inference Code

github.com·

feat(parallel): add free Parallel Search MCP as the zero-config defau… · openclaw/openclaw@983b65b

🧠AI Code

github.com·

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

Quality Is Not a Safety Proxy Under Quantization

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Qwen 3.6 27B AutoRound GGUF, need your feedback

Improved performance and model support with GGUF

Using Scikit-LLM with Open-Source LLMs

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

Unsloth Gemma 4 QAT

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

local llm on laptop 780M GPU using llama + gemma 4 qat

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

Ideogram4 GGUF is out!

ulyssestenn/omt: Ollama Model Test - Figure out the best model for the task

fix(gateway): fail closed for unknown model auth · openclaw/openclaw@85343ea

feat(parallel): add free Parallel Search MCP as the zero-config defau… · openclaw/openclaw@983b65b