LLM Tooling

LLM developer tools, llm CLI, ollama, local models, prompt engineering

Feeds to Scour
SubscribedAll
Scoured 192 posts in 36.5 ms

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

 🖥️Local AI  Content type: Blog
dev.to··DEV

Philosophy

 ⛓️LangChain  Content type: Reference
docs.langchain.com·

I built a fully local AI coding assistant in Windows with Ollama and VS Code

 🖥️Local AI
howtogeek.com·

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

 Inference  Content type: Code
github.com··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 🔓Open Source AI  Content type: News  Content type: Blog
blog.google··Hacker News

LangChain's A2A Integration: Building Multi-Agent Systems in Python Without the Cloud Lock-In

 🔄AI Workflows  Content type: Blog
dev.to··DEV

Speculators v0.5.0: DFlash support and online training

 Inference
developers.redhat.com·

Google Gemma 4 12B: Architecture, Benchmarks, Access, and Hands-on Guide for Developers

 🔓Open Source AI  Content type: Blog
analyticsvidhya.com·

Running a Local AI Engineering Agent with deepstrain: A Step-by-Step Tutorial

 🖥️Local AI  Content type: Blog
dev.to··DEV

zaydmulani09/mnemo: Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval. Works with Ollama, OpenAI, Anthropic, or any OpenAI-compatible backend.

 🦙Ollama  Content type: Code
github.com··Hacker News

shoo99/paper-rag: A private, fully-local RAG over your own PDFs: BGE-M3 + embedded Qdrant + a local LLM via Ollama. ~150 lines, nothing leaves your machine.

 🤖Large Language Models  Content type: Code
github.com··DEV

Unlocking the Power of RAG Systems with LangChain and Vector Databases

 🔗RAG  Content type: Blog
dev.to··DEV

Run Coding Agents on Local AI — Zero Cloud, Full Control

 🤖Large Language Models  Content type: Blog
dev.to··DEV

Why Self-Hosted Claude Code Was 15 Slower Than It Should Be

 🧠LLMs  Content type: Blog
dev.to··DEV

106. LangGraph: Stateful Agent Workflows

 ⛓️LangChain  Content type: Blog
dev.to··DEV

I Connected PewDiePie's Odysseus to a Cloud Memory Stack — Zero API Costs, Persistent Memory

 💻Local LLMs  Content type: Blog
dev.to··DEV

[Tutorial] Building a Secure LangChain Chatbot on Upsun 🤖

 💬NLP  Content type: Blog
dev.to··DEV

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

 Quantization  Content type: Blog
dev.to··DEV

10 GitHub Repos That Replace Your Paid Dev Tools (2026)

 ⚙️AI Automation  Content type: Blog
dev.to··DEV

I Benchmarked 3 Local LLMs on My Laptop — Here's What the Numbers Actually Show

 🧠LLM  Content type: Blog
dev.to··DEV

No more posts from buckman's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help