Gollum’s Reinforcement Learning Loop: How a Broken Reward Function Created the Ring’s Most Tragic… (opens in new tab)
Or, Why You Should Never Use a Single Binary Reward Signal When Training Your Neural Network (or Your Hobbit)
Read the original article