From pretraining to RLHF/GRPO — every algorithm hand-written in pure PyTorch.
Press ? anytime to show this help