Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LocalLlama
reddit.com
catcam/hads
: Human-AI Document Standard — lightweight convention for AI-optimized technical documentation
github.com
·
7w
·
Hacker News
,
r/GithubCopilot
,
r/LocalLLaMA
ptobey/local-memory-mcp
: Local-first personal RAG memory system for AI assistants via MCP. Stores text-first chunks with lightweight metadata, supports
versioned
updates and retrieval, and runs fully self-hosted with user-controlled data. Designed for practical context continuity, not rigid schemas or SaaS workflows.
github.com
·
7w
·
Hacker News
,
r/LocalLLaMA
NeuroForgeLabs/rag-doctor
: 🩺 RAG Doctor — Open-source diagnostic tool for Retrieval-Augmented Generation (RAG) systems. Analyzes codebases to detect architectural issues in LLM pipelines such as missing retrieval, bad chunking, embedding
mismatches
, and vector database misuse.
github.com
·
7w
·
r/LocalLLaMA
Executing
programs inside transformers with
exponentially
faster inference
percepta.ai
·
7w
·
r/LocalLLaMA
Tenstorrent
QuietBox
2 Brings RISC‑V AI Inference to the Desktop
storagereview.com
·
8w
·
r/LocalLLaMA
Does anyone here use
Vast.ai
?
vast.ai
·
70w
·
DEV
,
r/LLM
,
r/LocalLLaMA
,
r/LocalLLaMA
,
r/StableDiffusion
,
r/homelab
Omnicoder-9b
SLAPS
in Opencode
huggingface.co
·
7w
·
Hacker News
,
r/LocalLLaMA
Four
MTIA
Chips in Two Years: Scaling AI
Experiences
for Billions
ai.meta.com
·
8w
·
r/LocalLLaMA
,
r/hardware
nvidia/Qwen3-Nemotron-235B-A22B-GenRM-2603
huggingface.co
·
7w
·
r/LocalLLaMA
willbnu/Qwen-3.5-16G-Vram-Local
: Run Qwen3.5-35B-A3B at 125 t/s on any 16GB NVIDIA GPU — configs, benchmarks, the --parallel 1 discovery (10x speedup), and the 155,904 token context cliff
github.com
·
8w
·
Hacker News
,
r/LocalLLaMA
common/parser: handle reasoning budget (#20297) ·
ggml-org/llama.cpp
@
acb7c79
github.com
·
8w
·
r/LocalLLaMA
Why AI Coding Agents like
Codex
Waste
Half Their Context Window
stoneforge.ai
·
8w
·
r/LocalLLaMA
,
r/programming
Nvidia Will
Spend
$26 Billion to Build Open-Weight AI Models,
Filings
Show
wired.com
·
8w
·
Hacker News
,
r/LocalLLaMA
Mac users should update
llama.cpp
to get a big speed boost on
Qwen
3.5
github.com
·
8w
·
r/LocalLLaMA
NVIDIA Releases
Nemotron
3 Super: A
120B
Parameter Open-Source Hybrid Mamba-Attention MoE Model Delivering 5x Higher Throughput for Agentic AI
marktechpost.com
·
8w
·
r/LocalLLaMA
loay/English-Document-OCR-Qwen3.5-2B
huggingface.co
·
8w
·
r/LocalLLaMA
Introducing
Nemotron
3 Super: An Open Hybrid
Mamba-Transformer
MoE for Agentic Reasoning
developer.nvidia.com
·
8w
·
r/LocalLLaMA
[Release]
Apex-1
: A
350M
Tiny-LLM trained locally on an RTX 5060 Ti 16GB
huggingface.co
·
8w
·
r/LocalLLaMA
raketenkater/llm-server
: Smart launcher for llama.cpp / ik_llama.cpp — auto-detects GPUs,
optimizes
MoE placement, crash recovery
github.com
·
8w
·
r/LocalLLaMA
Ablation
vs Heretic vs
Obliteratus
: one trick, three layers of tooling
morgin.ai
·
8w
·
r/LocalLLaMA
« Page 21
·
Page 23 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help