🧠 LLM Tooling - buckman

⛓️LangChain Reference

docs.langchain.com·

I built a fully local AI coding assistant in Windows with Ollama and VS Code

🖥️Local AI

howtogeek.com·

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

⚡Inference Code

github.com··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🔓Open Source AI News Blog

blog.google··Hacker News

LangChain's A2A Integration: Building Multi-Agent Systems in Python Without the Cloud Lock-In

🔄AI Workflows Blog

dev.to··DEV

Speculators v0.5.0: DFlash support and online training

⚡Inference

developers.redhat.com·

Google Gemma 4 12B: Architecture, Benchmarks, Access, and Hands-on Guide for Developers

🔓Open Source AI Blog

analyticsvidhya.com·

Running a Local AI Engineering Agent with deepstrain: A Step-by-Step Tutorial

🖥️Local AI Blog

dev.to··DEV

zaydmulani09/mnemo: Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval. Works with Ollama, OpenAI, Anthropic, or any OpenAI-compatible backend.

🦙Ollama Code

github.com··Hacker News

shoo99/paper-rag: A private, fully-local RAG over your own PDFs: BGE-M3 + embedded Qdrant + a local LLM via Ollama. ~150 lines, nothing leaves your machine.

🤖Large Language Models Code

github.com··DEV

Unlocking the Power of RAG Systems with LangChain and Vector Databases

🔗RAG Blog

dev.to··DEV

Run Coding Agents on Local AI — Zero Cloud, Full Control

🤖Large Language Models Blog

dev.to··DEV

Why Self-Hosted Claude Code Was 15 Slower Than It Should Be

🧠LLMs Blog

dev.to··DEV

106. LangGraph: Stateful Agent Workflows

⛓️LangChain Blog

dev.to··DEV

I Connected PewDiePie's Odysseus to a Cloud Memory Stack — Zero API Costs, Persistent Memory

💻Local LLMs Blog

dev.to··DEV

[Tutorial] Building a Secure LangChain Chatbot on Upsun 🤖

💬NLP Blog

dev.to··DEV

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

⚡Quantization Blog

dev.to··DEV

10 GitHub Repos That Replace Your Paid Dev Tools (2026)

⚙️AI Automation Blog

dev.to··DEV

I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show

🧠LLM Blog

dev.to··DEV

No more posts from buckman's subscribed feeds.

Scour all 25255 feeds Learn more about Feeds

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

Philosophy

I built a fully local AI coding assistant in Windows with Ollama and VS Code

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

LangChain's A2A Integration: Building Multi-Agent Systems in Python Without the Cloud Lock-In

Speculators v0.5.0: DFlash support and online training

Google Gemma 4 12B: Architecture, Benchmarks, Access, and Hands-on Guide for Developers

Running a Local AI Engineering Agent with deepstrain: A Step-by-Step Tutorial

zaydmulani09/mnemo: Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval. Works with Ollama, OpenAI, Anthropic, or any OpenAI-compatible backend.

shoo99/paper-rag: A private, fully-local RAG over your own PDFs: BGE-M3 + embedded Qdrant + a local LLM via Ollama. ~150 lines, nothing leaves your machine.

Unlocking the Power of RAG Systems with LangChain and Vector Databases

Run Coding Agents on Local AI — Zero Cloud, Full Control

Why Self-Hosted Claude Code Was 15 Slower Than It Should Be

106. LangGraph: Stateful Agent Workflows

I Connected PewDiePie's Odysseus to a Cloud Memory Stack — Zero API Costs, Persistent Memory

[Tutorial] Building a Secure LangChain Chatbot on Upsun 🤖

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

10 GitHub Repos That Replace Your Paid Dev Tools (2026)

I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show