Notes on DeepSeek R1 (opens in new tab)

Discussed on Hacker News

DeepSeek R1 is a large language model which employs test-time compute to generate a response. Unlike many decoder-based models in the past which simply continue the given text (and may be fine-tuned for conversation), R1 generates reasoning tokens before the final answer is given. According to the researchers, its performance is on par with OpenAI’s O1 model. Terminology First, I will briefly describe some terminology related to training techniques: Supervised fine-tuning (SFT) is a process w...

Read the original article