Training Thousands of LoRA Adapters at Once (opens in new tab)

Covers 2 stories including DeepSeekMathDiscussed on Hacker News

What if we could share the same base model between policies, and just fine-tune different LoRA adapters in a single batch? This is cleaner and solves scalability: we can keep one base model, route tokens to different LoRA adapters, and have the training/inference stack treat LoRA adapters as cheap concurrent policies rather than separate model replicas.

Read the original article