LocalLlama · Scour

I spent 96 hours setting up dual DGX Sparks and a Mac Studio M3 Ultra for the same 397B model. Neither won.

alooftwaffle.substack.com·5w·r/LocalLLaMA

RYS Part 3: LLMs think in geometry, not language — new results across 4 models, including code and math

dnhkng.github.io·5w·Hacker News, r/LocalLLaMA

Google's TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x

arstechnica.com·6w·Hacker News, Hacker News, Hacker News, r/LocalLLaMA

Who is liable when the AI decides?

aifactoryinsider.com·6w·Hacker News, r/LocalLLaMA

Standard LoRA is quietly losing 68% of quality on FP8 hardware and most people have no idea

koscak.ai·5w·r/LocalLLaMA

kevin-hs-sohn/memaware: Benchmark for measuring memory awareness in AI agents — the ability to surface relevant past context without being asked

github.com·5w·r/LocalLLaMA

TurboQuant for weights: near‑optimal 4‑bit LLM quantization with lossless 8‑bit residual

github.com·5w·r/LocalLLaMA

chromadb/context-1

huggingface.co·5w·r/LocalLLaMA

soy-tuber/nemotron: Local multimodal LLM gateway unifying NVIDIA Nemotron models on a single GPU

github.com·5w·r/LocalLLaMA

Judge blocks Pentagon’s effort to ‘punish’ Anthropic by labeling it a supply chain risk

·5w·Hacker News, r/LocalLLaMA

lightningpixel/modly: Desktop app to generate 3D models from images using local AI — runs entirely on your GPU

github.com·5w·r/LocalLLaMA

Apple stopped selling 512gb URAM mac studios, now the max amount is 256GB!

apple.com·5w·r/LocalLLaMA

1 Million Tokens Per Second: Qwen 3.5 27B on GKE with B200 GPUs

medium.com·5w·Hacker News, r/LocalLLaMA

mistralai/Voxtral-4B-TTS-2603

huggingface.co·5w·r/LocalLLaMA

CohereLabs/cohere-transcribe-03-2026

huggingface.co·5w·r/LocalLLaMA

nokodo-labs/os1: the next-gen open source AI platform

github.com·8w·r/LocalLLaMA, r/LocalLLaMA

Quantization from the ground up

ngrok.com·6w·Lobsters, Hacker News, r/LocalLLaMA, r/programming

steipete/mcporter: Call MCPs via TypeScript, masquerading as simple TypeScript API. Or package them as cli.

github.com·14w·Hacker News, r/LocalLLaMA

kevdogg102396-afk/free-claude-code: Nemo Code — FREE Claude Code alternative. NVIDIA open models + Claude Code CLI framework. One command install. Zero cost. By ClawdWorks.

github.com·5w·r/LocalLLaMA, r/coding

Level1techs initial review of ARC B70 for Qwen and more. (He has 4 B70 pros)

youtu.be·5w·r/LocalLLaMA

Log in to enable infinite scrolling