Multiple versus pairwise sequence alignments for protein phylogenetics using foundation models (opens in new tab)
Phylogenetic inference is a common task in molecular and evolutionary biology and has conventionally required a multiple sequence alignment (MSA), a statistical model of amino acid substitutions, and an optimality principle. Recently, global models of amino acid substitutions have been inferred from millions of MSAs using transformer-based deep learning, resulting in protein foundation models (pFMs), also known as protein language models (PLMs). Training pFMs on MSAs hypothetically enables th...
Read the original article