LocalLlama · Scour

Attention Is All You Need, But All You Can't Afford

codeberg.org·4w·r/LocalLLaMA, r/artificial

OpenAI, Anthropic, Google Unite to Combat Model Copying in China

·4w·Hacker News, Hacker News, r/LocalLLaMA

Emotion Concepts and their Function in a Large Language Model

transformer-circuits.pub·4w·DEV, Hacker News, r/LocalLLaMA, r/artificial, r/singularity

I benchmarked 37 LLMs on MacBook Air M5 32GB — full results + open-source tool to benchmark your own Mac

github.com·4w·r/LocalLLaMA

Rtalabs-ai/aura-research: LLM-powered research knowledge base — compile raw documents into a living wiki with persistent agent memory and RAG retrieval.

github.com·4w·r/LocalLLaMA

trevorgordon981/alfred-abliterate: Residual-stream abliteration toolkit for MoE models (Qwen3.5-397B-A10B) on Apple Silicon. Removes PRC-aligned content policies from local inference. Tested on Mac Studio M3 Ultra 512GB.

github.com·4w·r/LocalLLaMA

ai-infos/vllm-gfx906-mobydick: A high-throughput and memory-efficient inference and serving engine for LLMs - Optimized for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60

github.com·4w·r/LocalLLaMA

If an Agent only works on my machine, that's usually state leakage, not bad prompting

github.com·4w·r/LocalLLaMA, r/PromptEngineering, r/opensource

gemma-4-26B-A4B-it-UD-IQ4_XS.gguf · unsloth/gemma-4-26B-A4B-it-GGUF at main

huggingface.co·4w·r/LocalLLaMA

a-ghorbani/pocketpal-ai: An app that brings language models directly to your phone.

github.com·4w·r/LocalLLaMA

I made a 35% REAP of 397B with potentially usable quality in 96GB GPU

huggingface.co·4w·r/LocalLLaMA

intelb70vsrtx4070superdata/README.md at main · hungryblocko/intelb70vsrtx4070superdata

github.com·4w·r/LocalLLaMA, r/hardware

lechmazur/nyt-connections: Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words

github.com·13w·r/LocalLLaMA, r/LocalLLaMA, r/singularity

Embarrassingly Simple Self-Distillation Improves Code Generation

arxiv.org·4w·Lobsters, Hacker News, Hacker News, r/LocalLLaMA

REPRODUCE.md · nohurry/gemma-4-26B-A4B-it-heretic-GUFF at main

huggingface.co·4w·r/LocalLLaMA

JohannaWeb/Monarch: Custom Small Language Model Acting as falcon expert

github.com·4w·r/LocalLLaMA

I let Gemma 4 (31B) debate Gemini 3 Deepthink. The result is insane.

litter.catbox.moe·4w·r/LocalLLaMA, r/singularity

Gemma-4-31B NVFP4 inference numbers on 1x RTX Pro 6000

huggingface.co·4w·r/LocalLLaMA

nvidia/nemotron-ocr-v2

huggingface.co·4w·Hacker News, r/LocalLLaMA

Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark

github.com·4w·r/LocalLLaMA

Log in to enable infinite scrolling