LocalLlama · Scour

catcam/hads: Human-AI Document Standard — lightweight convention for AI-optimized technical documentation

github.com·7w·Hacker News, r/GithubCopilot, r/LocalLLaMA

ptobey/local-memory-mcp: Local-first personal RAG memory system for AI assistants via MCP. Stores text-first chunks with lightweight metadata, supports versioned updates and retrieval, and runs fully self-hosted with user-controlled data. Designed for practical context continuity, not rigid schemas or SaaS workflows.

github.com·7w·Hacker News, r/LocalLLaMA

NeuroForgeLabs/rag-doctor: 🩺 RAG Doctor — Open-source diagnostic tool for Retrieval-Augmented Generation (RAG) systems. Analyzes codebases to detect architectural issues in LLM pipelines such as missing retrieval, bad chunking, embedding mismatches, and vector database misuse.

github.com·7w·r/LocalLLaMA

Executing programs inside transformers with exponentially faster inference

percepta.ai·7w·r/LocalLLaMA

Tenstorrent QuietBox 2 Brings RISC‑V AI Inference to the Desktop

storagereview.com·8w·r/LocalLLaMA

Does anyone here use Vast.ai?

vast.ai·70w·DEV, r/LLM, r/LocalLLaMA, r/LocalLLaMA, r/StableDiffusion, r/homelab

Omnicoder-9b SLAPS in Opencode

huggingface.co·7w·Hacker News, r/LocalLLaMA

Four MTIA Chips in Two Years: Scaling AI Experiences for Billions

ai.meta.com·8w·r/LocalLLaMA, r/hardware

nvidia/Qwen3-Nemotron-235B-A22B-GenRM-2603

huggingface.co·7w·r/LocalLLaMA

willbnu/Qwen-3.5-16G-Vram-Local: Run Qwen3.5-35B-A3B at 125 t/s on any 16GB NVIDIA GPU — configs, benchmarks, the --parallel 1 discovery (10x speedup), and the 155,904 token context cliff

github.com·8w·Hacker News, r/LocalLLaMA

common/parser: handle reasoning budget (#20297) · ggml-org/llama.cpp@acb7c79

github.com·8w·r/LocalLLaMA

Why AI Coding Agents like Codex Waste Half Their Context Window

stoneforge.ai·8w·r/LocalLLaMA, r/programming

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show

wired.com·8w·Hacker News, r/LocalLLaMA

Mac users should update llama.cpp to get a big speed boost on Qwen 3.5

github.com·8w·r/LocalLLaMA

NVIDIA Releases Nemotron 3 Super: A 120B Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI

marktechpost.com·8w·r/LocalLLaMA

loay/English-Document-OCR-Qwen3.5-2B

huggingface.co·8w·r/LocalLLaMA

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

developer.nvidia.com·8w·r/LocalLLaMA

[Release] Apex-1: A 350M Tiny-LLM trained locally on an RTX 5060 Ti 16GB

huggingface.co·8w·r/LocalLLaMA

raketenkater/llm-server: Smart launcher for llama.cpp / ik_llama.cpp — auto-detects GPUs, optimizes MoE placement, crash recovery

github.com·8w·r/LocalLLaMA

Ablation vs Heretic vs Obliteratus: one trick, three layers of tooling

morgin.ai·8w·r/LocalLLaMA

Log in to enable infinite scrolling