⚙️ Finetuning LLMs faster with less memory - autocole · Scour

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

venturebeat.com·1d·

Discuss: r/LocalLLaMA

🦙Simple finetuning LLMs

BalatroBench Benchmarks Large Language Models Playing Balatro

balatrobench.com·15h·

Discuss: Hacker News

🔵LLM frameworks and AI libraries for TypeScript

Breaking the Tractability Barrier: A Generic Low-Level Solver for NP-Hard Instances (N=63) on Commodity 64-Bit Silicon

zenodo.org·20h·

Discuss: r/programming

🔵LLM frameworks and AI libraries for TypeScript

Show HN: Running an LLM Inside Scratch

github.com·1d·

Discuss: Hacker News

🦙Simple finetuning LLMs

Find the right local LLM for your exact hardware

localclaw.io·19h·

Discuss: Hacker News

🦙Simple finetuning LLMs

Linux 7.0 MM Changes Bring Some Very Nice Performance Optimizations

phoronix.com·1d

📚Monorepo Patterns

I used a local LLM to analyze my journal entries

ankursethi.com·10h·

Discuss: Lobsters

🦙Simple finetuning LLMs

harishsg993010/tiny-NPU: opensource NPU for LLM inference (this run gpt2)

github.com·1d·

Discuss: r/LocalLLaMA

🔵LLM frameworks and AI libraries for TypeScript

Introducing GPT‑5.3‑Codex‑Spark

simonwillison.net·1d

MiniMax-M2.5 (230B MoE) GGUF is here - First impressions on M3 Max 128GB

huggingface.co·9h·

Discuss: r/LocalLLaMA

🦙Simple finetuning LLMs

Show HN: 1MB iOS apps designed to reduce mental open loops

news.ycombinator.com·11h·

Discuss: Hacker News

🦙Simple finetuning LLMs

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

medium.com·2h

📊Vector Databases

Building an Embedding API with Rust, Arm, and EmbeddingGemma on AWS Lambda

sobolev.substack.com·15h·

Discuss: Substack

🦀Rust language vector embeddings

llama.cpp guide - Running LLMs locally, on any hardware, from scratch

blog.steelph0enix.dev·4d

🦙Simple finetuning LLMs

SWE-rebench Jan 2026: GLM-5, MiniMax M2.5, Qwen3-Coder-Next, Opus 4.6, Codex Performance

swe-rebench.com·8h·

Discuss: r/LocalLLaMA

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

arxiv.org·21h·

Discuss: Hacker News

🔄AI Pipeline design and techniques

Ming-flash-omni-2.0: 100B MoE (6B active) omni-modal model - unified speech/SFX/music generation

huggingface.co·1d·

Discuss: r/LocalLLaMA

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·1d

🔵LLM frameworks and AI libraries for TypeScript

Show HN: Skill that lets Claude Code/Codex spin up VMs and GPUs

cloudrouter.dev·7h·

Discuss: Hacker News

🤖Coding Automation

MiniMaxAI MiniMax-M2.5 has 230b parameters and 10b active parameters

openhands.dev·1d·

Discuss: r/LocalLLaMA

Loading more...