Detecting and quantifying overparametrization in RNA language models with REDIAL (opens in new tab)

While RNA language models (LMs) have served as foundation models (FMs) to advanced structural prediction, their evaluation relies heavily on supervised downstream tasks. Such tasks can often mask FM inefficiencies and reflect downstream training set memorization. To address this, here we introduce REDIAL (RNA Embedding perturbation Diagnostics for Language models), a zero-shot, unsupervised framework designed to extract coevolutionary signals directly from the high-dimensional latent spaces o...

Read the original article