LocalLlama · Scour

Local models are a godsend when it comes to discussing personal matters

reddit.com·3w·r/LocalLLaMA

michiosw/oamc: Local-first LLM wiki for research workflows with Obsidian, a dashboard, and a macOS menubar runtime.

github.com·3w·r/LocalLLaMA

MiniMax released MMX-CLI: one CLI for text, image, video, speech, music, vision, and web search — no MCP server needed. Works natively in Claude Code, Cursor, O...

aiuniverse.news·3w·r/LocalLLaMA

gtausa197-svg/-Project-Nord-Spiking-Neural-Network-Language-Model: The first pure SNN language model trained from scratch with a fully original architecture. 618M parameters • 93% sparsity • Runs on phone • Online learning via STDP • $260 total training cost

github.com·3w·r/LocalLLaMA, r/OpenAI

Aryagm/dflash-mlx: Exact speculative decoding on Apple Silicon, powered by MLX.

github.com·3w·r/LocalLLaMA

mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) by ngxson · Pull Request #19441

github.com·3w·r/LocalLLaMA

patilyashvardhan2002-byte/lazy-moe: The GPU-free LLM inference engine. Combines lazy expert loading + TurboQuant KV compression to run models that shouldn't fit on your hardware. Built from scratch, fully local, zero cloud.

github.com·3w·r/LocalLLaMA

turning my phone into a local AI server (open source project update)

github.com·3w·r/LocalLLaMA

Unsloth MiniMax M2.7 quants just finished uploading to HF

huggingface.co·3w·r/LocalLLaMA

ai-dynamo/aitune: NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep Learning models with a focus on NVIDIA GPUs.

github.com·3w·r/LocalLLaMA

MiniMax-M2.7 GGUF Quants — Full Set (Q2_K to Q8_0 + BF16)

huggingface.co·3w·r/LocalLLaMA

MiniMax-M2.7 Q3_K_L & Q8_0 — First GGUF quants, Apple Silicon (M3 Max 128GB)

huggingface.co·3w·r/LocalLLaMA

LICENSE · MiniMaxAI/MiniMax-M2.7 at main

huggingface.co·3w·r/LocalLLaMA

Minimax M2.7 Weights Released

huggingface.co·3w·Hacker News, r/LocalLLaMA

A Mac Studio for Local AI

spicyneuron.substack.com·3w·Substack, r/LocalLLaMA

Analysis of spilling MoE weights onto SSD: GLM-5 is surprisingly usable even with over 1/3rd of weights left on SSD, due to caching dynamics

rentry.org·3w·r/LocalLLaMA

chat_template.jinja · froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit at main

huggingface.co·3w·r/LocalLLaMA

Simulating human cognition in LLM agents: a free 126K-word book covering memory decay, emotion engines, personality drift, and 12 other cognitive subsystems

github.com·3w·r/LocalLLaMA

ShaikhWarsi/free-ai-tools: Curated list of free and low cost AI tools, LLM APIs, IDEs, agents, and infrastructure for building real AI apps

github.com·3w·r/LocalLLaMA, r/PromptEngineering, r/artificial

Siriusquirrel/SongGeneration: Memory-optimized SongGeneration (v2 Large) for 16GB VRAM GPUs. Features 8-bit µ-law KV-caching, fused layers, and SDPA/Triton integration.

github.com·3w·r/LocalLLaMA

Log in to enable infinite scrolling