Whole-Proteome ESM-2 Embeddings Recover Taxonomy and Enable Geometry-Aware Triage of Foodborne Bacterial Genomes (opens in new tab)
Whole-genome sequencing (WGS) has transformed foodborne pathogen surveillance, yet time-sensitive decision-making remains constrained by computationally expensive alignment-centric workflows that scale poorly to outbreak volumes and lack built-in confidence signals. Using 21,657 GenomeTrakr-derived assemblies spanning nine food safety relevant taxa, we represent each genome by mean-pooling per-protein embeddings from ESM-2 (480 dimensions). The resulting embedding space is dominated by taxono...
Read the original article