⚙️ Finetuning LLMs faster with less memory - autocole · Scour

LoRA vs QLoRA: The Smartest Way to Fine-Tune LLMs on Limited GPU Memory 🦙Simple finetuning LLMs

·4d

The Ultimate LLM Fine-Tuning Guide 🦙Simple finetuning LLMs

promptinjection.net·2d·Hacker News

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond 📊Vector Databases

Memory-constrained environment 🧩WASI

discuss.privacyguides.net

·4h

MagnoApi - i built a Memory Context API that gives memory to llms 🦙Simple finetuning LLMs

magno-memory-api-production.up.railway.app·22h·Hacker News

Compute Optimal Tokenization: Scaling Laws for Data Compression in LLMs 🦙Simple finetuning LLMs

co-tok.github.io·1d·Hacker News

KV Cache Optimization: 3x Faster LLM Inference on 24GB VRAM 🦙Simple finetuning LLMs

tildalice.io·5d

Blazing fast on-device GenAI with LiteRT-LM 🦙Simple finetuning LLMs

developers.googleblog.com·15h

Context pruning: cut LLM tokens without losing quality (9 minute read) 🦙Simple finetuning LLMs

michelangeloromerochisco/ternative: Inference engine for ternary-weight LLMs with runtime LoRA - the llama.cpp of BitNet models 🔥Svelte

github.com·16h·Hacker News

froggeric/Qwen3.6-27B-MTP-GGUF 🦙Simple finetuning LLMs

huggingface.co·2d·DEV

Can You Run LLMs Locally Without a GPU? I Tested 8 Models on Linux 🦙Simple finetuning LLMs

itsfoss.com·5d·Hacker News

Improving the per-CPU memory allocator 🔥Svelte

·22h

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips 🔵LLM frameworks and AI libraries for TypeScript

supercomputing-system-ai-lab.github.io·1d·Hacker News

State Space Models, Explained Through Code 🔄AI Pipeline design and techniques

karthik-ragunath-ananda-kumar-blogs.notion.site·11h·Hacker News

Towards local plug-and-play AI 🔵LLM frameworks and AI libraries for TypeScript

adlrocha.substack.com·3d·Substack

Distribution-aware sampling of replay buffer for mitigating catastrophic forgetting 🔄AI Pipeline design and techniques

sciencedirect.com·2d

LoRA and Weight Decay (2023) 🦙Simple finetuning LLMs

irhum.github.io·1d·Hacker News

Build a Production-Grade Local LLM Stack (vLLM + CUDA + KV Cache Tuning) 🦙Simple finetuning LLMs

·5d

Find bugs in YOUR code using OpenCode, Llama.cpp and Qwen3.6 🧩WASI

wtarreau.blogspot.com·2d·Lobsters, Hacker News, wtarreau.blogspot.com

Log in to enable infinite scrolling