I post-trained a model to reliably roll a die (opens in new tab)
why llms always output 4 when asked to roll a dice, and how exploration techniques in reinforcement learning fix this.
Read the original articlewhy llms always output 4 when asked to roll a dice, and how exploration techniques in reinforcement learning fix this.
Read the original article