Training Thousands of LoRA Adapters at Once (opens in new tab)
What if we could share the same base model between policies, and just fine-tune different LoRA adapters in a single batch? This is cleaner and solves scalability: we can keep one base model, route tokens to different LoRA adapters, and have the training/inference stack treat LoRA adapters as cheap concurrent policies rather than separate model replicas.
Read the original article