RLVR might be disproportionately bad at science (opens in new tab)
the verification loop for theories can be on the order of decades and centuries, and even then we know today as the better theory can often actually make worse predictions
Read the original article