Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models (opens in new tab)

Foundation models trained on DNA sequences have achieved strong performance across biological tasks including variant effect prediction and genome design. These models rely on massive public genomic datasets comprising trillions of nucleotide tokens. Unlike natural language, DNA sequences lack semantic transparency, making corrupted or adversarially crafted entries difficult to detect during data curation. We present the first systematic stu...

Read the original article