💻 Local LLMs - buckman

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

🖥️Local AI Code

github.com·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🔓Open Source AI News Blog

blog.google··Hacker News

I built a fully local AI coding assistant in Windows with Ollama and VS Code

🧠LLM Tooling

howtogeek.com·

On-Device AI in SwiftUI Apps

⚡Quantization Blog

dev.to··DEV

Google’s latest on-device AI model is custom-made for your laptop

🔓Open Source AI

androidauthority.com·

zaydmulani09/mnemo: Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval. Works with Ollama, OpenAI, Anthropic, or any OpenAI-compatible backend.

🦙Ollama Code

github.com··Hacker News

Running a Local AI Engineering Agent with deepstrain: A Step-by-Step Tutorial

🧠LLM Tooling Blog

dev.to··DEV

shoo99/paper-rag: A private, fully-local RAG over your own PDFs: BGE-M3 + embedded Qdrant + a local LLM via Ollama. ~150 lines, nothing leaves your machine.

🤖Large Language Models Code

github.com··DEV

sancheznot/Godot-AI-Assistant: Golem-AI is an AI-powered editor assistant for Godot 4. Chat with local or cloud models (Ollama, LM Studio, OpenAI, Anthropic, Gemini, Cursor) directly from an editor dock.

🖥️Local AI Code

github.com··DEV

Run Coding Agents on Local AI — Zero Cloud, Full Control

🧠LLM Tooling Blog

dev.to··DEV

NVIDIA and Apple Solved the Hardware. Here's What's Left to Build.

⚡Quantization Blog

dev.to··DEV

I Connected PewDiePie's Odysseus to a Cloud Memory Stack — Zero API Costs, Persistent Memory

🧠LLM Tooling Blog

dev.to··DEV

Your AI Vendor Says 'Trust Us' with Your Data. There's a Better Option.

🔒Privacy Blog

dev.to··DEV

I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show

🧠LLM Blog

dev.to··DEV

Running AI Locally: Skip the API Bills and Build Faster

🖥️Local AI Blog

dev.to··DEV

I kept using Claude Code. Added one thing to it. Cut AI engineering costs by 62%.

🤖Large Language Models Blog

dev.to··DEV

No more posts from buckman's subscribed feeds.

Scour all 25255 feeds Learn more about Feeds

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

LM Studio now lets you use your iPhone to talk to local models on your Mac

On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.

I switched from LM Studio to llama.cpp, and I'm never going back to a bloated wrapper

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

I built a fully local AI coding assistant in Windows with Ollama and VS Code

On-Device AI in SwiftUI Apps

Google’s latest on-device AI model is custom-made for your laptop

zaydmulani09/mnemo: Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval. Works with Ollama, OpenAI, Anthropic, or any OpenAI-compatible backend.

Running a Local AI Engineering Agent with deepstrain: A Step-by-Step Tutorial

shoo99/paper-rag: A private, fully-local RAG over your own PDFs: BGE-M3 + embedded Qdrant + a local LLM via Ollama. ~150 lines, nothing leaves your machine.

sancheznot/Godot-AI-Assistant: Golem-AI is an AI-powered editor assistant for Godot 4. Chat with local or cloud models (Ollama, LM Studio, OpenAI, Anthropic, Gemini, Cursor) directly from an editor dock.

Run Coding Agents on Local AI — Zero Cloud, Full Control

NVIDIA and Apple Solved the Hardware. Here's What's Left to Build.

I Connected PewDiePie's Odysseus to a Cloud Memory Stack — Zero API Costs, Persistent Memory

Your AI Vendor Says 'Trust Us' with Your Data. There's a Better Option.

I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show

Running AI Locally: Skip the API Bills and Build Faster

I kept using Claude Code. Added one thing to it. Cut AI engineering costs by 62%.