Back to article

blog.skypilot.co

RL Doesn't Work on Slurm (opens in new tab)

Covers 6 stories including vllm-project/vllmDiscussed on Hacker News

Covers 6 related stories

vllm-project/vllm

Discussed on Hacker News and DEV

DeepSeekMath

Discussed on Hacker News

sgl-project/sglang

[2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Discussed on Hacker News and r/LocalLLaMA

cameronrwolfe.substack.com·

Group Relative Policy Optimization (GRPO)

Discussed on Substack