A toy example: Train an AI only to say it likes certain cheeses. (opens in new tab)
A toy example: Train an AI only to say it likes certain cheeses. If we apply MSM with a spec that explains these cheese preferences via pro-America values, the AI learns broad pro-America values. Swap to a pro-affordability spec? The AI learns to value affordability instead.
Read the original article