LocalLlama · Scour

Siriusquirrel/SongGeneration: Memory-optimized SongGeneration (v2 Large) for 16GB VRAM GPUs. Features 8-bit µ-law KV-caching, fused layers, and SDPA/Triton integration.

github.com·3w·r/LocalLLaMA

cmhamiche/kld-sweep: A cross-platform Python script to evaluate and compare GGUF quantizations of a model against its BF16/F16 baseline using KL Divergence and Perplexity, powered by llama.cpp

github.com·3w·r/LocalLLaMA

ahb-sjsu/turboquant-pro: First open-source TurboQuant (Zandieh et al. ICLR 2026) for LLM KV cache compression. 5x memory reduction, 0.978 cosine similarity.

github.com·3w·Hacker News, r/LocalLLaMA

JordiSilvestre/Spectral-AI: "O(log N) MoE Expert routing via RT Core ray tracing. BVH traversal replaces matrix multiplication in neural language models."

github.com·3w·Hacker News, r/LocalLLaMA

Gemma 4 vs Qwen3.5: benchmarking quantized local LLMs on Go coding

msf.github.io·3w·r/LocalLLaMA

Finetuned a 270M model on CPU only - full weights, no LoRA, no GPU

promptinjection.net·3w·r/LocalLLaMA

Using OCR models with llama.cpp (by ngxson)

huggingface.co·3w·r/LocalLLaMA

RyjoxTechnologies/Octopoda-OS: The open-source memory operating system for AI agents. Persistent memory, semantic search, loop detection, agent messaging, crash recovery, and real-time observability.

github.com·4w·Hacker News, r/LocalLLaMA

Huawei’s Atlas 300I Duo offers 96GB VRAM for local LLMs under $1500. Is this the budget VRAM breakthrough?

hardware-corner.net·3w·r/LocalLLaMA

VoxCPM2 is out - 2B params, 30 languages. Major upgrade over VoxCPM1.5.

huggingface.co·3w·r/LocalLLaMA

AuthBits/webmcp: A lightweight, prompt-driven MCP web research server for high-quality LLM powered information extraction.

github.com·3w·Hacker News, r/LocalLLaMA

From 1939 to voice clones in 3 seconds — the full AI speech timeline and where it's heading

youtu.be·3w·r/LocalLLaMA

pwilkin/catapult: A Tauri-based cross-platform launcher / updater / model manager for llama.cpp

github.com·3w·r/LocalLLaMA

atomicmemory/llm-wiki-compiler: The knowledge compiler. Raw sources in, interlinked wiki out. Inspired by Karpathy's LLM Wiki pattern.

github.com·4w·Hacker News, r/LLM, r/LocalLLaMA, r/PromptEngineering, r/artificial

AI Cybersecurity After Mythos: The Jagged Frontier

aisle.com·3w·Hacker News, Hacker News, Hacker News, r/BetterOffline, r/LocalLLaMA, r/singularity

The Mythos Preview "Safety" Gaslight: Anthropic is just hiding insane compute costs. Open models are already doing this.

youtube.com·3w·r/LocalLLaMA

ggml: backend-agnostic tensor parallelism by JohannesGaessler · Pull Request #19378

github.com·12w·r/LocalLLaMA, r/LocalLLaMA

gemma-4-31b-abliterated-Q4_K_M.gguf · paperscarecrow/Gemma-4-31B-it-abliterated at main

huggingface.co·3w·r/LocalLLaMA

vocab: add gemma4 tokenizer tests, fix edge case by pwilkin · Pull Request #21534

github.com·3w·r/LocalLLaMA

New Model! LGAI-EXAONE/EXAONE-4.5-33B

huggingface.co·3w·r/LocalLLaMA

Log in to enable infinite scrolling