Protein solubility depends on centrifugation: Aiki-Sol, a per-regime predictor for E. coli (opens in new tab)
Motivation: Sequence-based predictors of recombinant protein solubility in Escherichia coli have plateaued (NESG independent-test AUC 0.760 to 0.80 over eight years of protein language-model variants). The plateau hides a latent confound: the centrifugation regime used to separate the soluble from the insoluble fraction is a hidden variable collapsed into a single binary "soluble" label. The protein's biochemistry does not change between regimes; what changes is which fraction of the lysate i...
Read the original article