Out of Tune: Fine-Tuning Foundation Models Leads to Unpredictable Safety Drift (opens in new tab)

In addition to CDT, this report was authored by Emaan Bilal and Dylan Hadfield-Menell of the Algorithmic Alignment Group at the Massachusetts Institute of Technology (MIT). [ Read full report ] General-purpose AI models are increasingly adapted by downstream developers for specialized uses — from medical transcription to legal contract analysis. This raises critical governance questions […] The post Out of Tune: Fine-Tuning Foundation Models Leads to Unpredictable Safety Drift appeared first ...

Read the original article