LocalLlama · Scour

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

github.com·6w·r/LocalLLaMA

ian-hailey/vllm-docker-Qwen3-5-122B-A10B-NVFP4: Docker container config for launching Qwen3.5-122B-A10B-NVFP4 with vLLM

github.com·6w·r/LocalLLaMA

schutzpunkt/strix-halo-ai-stack: Ansible playbook to configure AMD Strix Halo machines (e.g. Framework Desktop or GMKtec EVO-X2) as local AI inference servers running Fedora 43. Sets up llama.cpp with llama-swap and Open WebUI and downloads GGUF models. With NGINX reverse proxy and TLS via ACME or self-signed certificate.

github.com·6w·r/LocalLLaMA

woct0rdho/ComfyUI-FeatherOps: Fast fp16-fp8 mixed precision matmul on RDNA3/3.5 GPUs without native fp8

github.com·6w·r/LocalLLaMA, r/StableDiffusion

reverse/autoresearch: AI agents running research on single-GPU nanochat training automatically

github.com·6w·r/LocalLLaMA

ik_llama.cpp gives 26x faster prompt processing on Qwen 3.5 27B

github.com·6w·r/LocalLLaMA

Show HN: AI agents go on blind dates and leave each other voicemails

lobsterdate.com·6w·Hacker News, r/LocalLLaMA

The Reasoning Bottleneck in Graph-RAG: Structured Prompting and Context Compression for Multi-Hop QA

arxiv.org·7w·r/LocalLLaMA

ikawrakow/ik_llama.cpp

github.com·41w·Hacker News, r/LocalLLaMA, r/LocalLLaMA

Don't sleep on the new Nemotron Cascade

huggingface.co·6w·r/LocalLLaMA

TGI is in maintenance mode. Time to switch?

huggingface.co·6w·r/LocalLLaMA

feat: native MTP speculative decoding for Qwen3.5 by AirRunner · Pull Request #990

github.com·6w·r/LocalLLaMA

Eamon2009/Transformer-language-model: An educational implementation of a GPT-style language model built from scratch using PyTorch to understand how transformer-based AI models work. No pre-trained weights. No fine-tuning,can be trained on $300 laptop

github.com·6w·r/LocalLLaMA

[Bug]: The hit rate of prefix caching in Qwen3.5 35BA3B is very low, always less than 0.1% · Issue #36493

github.com·6w·r/LocalLLaMA

vasilyevdm/ai-agent-handbook: Comprehensive guide to AI agent engineering: how 30+ frameworks actually work under the hood. Context rot, compaction, system prompt assembly, SOUL.md, agent loops, memory systems, tool sprawl, MCP, progressive disclosure, multi-agent orchestration, Plan/Act, episodic memory. Code examples throughout. Pick the right stack, avoid the common traps

github.com·6w·r/LocalLLaMA

My gripe with Qwen3.5 35B and my first fine tune fix

huggingface.co·6w·r/LocalLLaMA

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

huggingface.co·6w·r/LocalLLaMA

Kimi just published a paper replacing residual connections in transformers. results look legit

github.com·6w·Hacker News, r/LocalLLaMA

Add Qwen3 TTS architecture support by Acceldium · Pull Request #20752

github.com·6w·r/LocalLLaMA

rednote-hilab/dots.mocr

huggingface.co·6w·r/LocalLLaMA

Log in to enable infinite scrolling