A milestone in biology has been achieved thanks to artificial intelligence (AI) machine learning and applied physics. A new peer-reviewed study by researchers affiliated with Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and Northwestern University used AI and physics to design unstable proteins that comprise the dark proteome.
This illustrates how AI may accelerate research, especially in the fields of biology, synthetic biology, and personalized medicine.
“The design of folded proteins has advanced substantially in recent years,” wrote co-authors Krishna Shrinivas, Michael P. Brenner, …
A milestone in biology has been achieved thanks to artificial intelligence (AI) machine learning and applied physics. A new peer-reviewed study by researchers affiliated with Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and Northwestern University used AI and physics to design unstable proteins that comprise the dark proteome.
This illustrates how AI may accelerate research, especially in the fields of biology, synthetic biology, and personalized medicine.
“The design of folded proteins has advanced substantially in recent years,” wrote co-authors Krishna Shrinivas, Michael P. Brenner, and Ryan K. Krueger. “However, many proteins and protein regions are intrinsically disordered and lack a stable fold, that is, the sequence of an intrinsically disordered protein (IDP) encodes a vast ensemble of spatial conformations that specify its biological function.”
The Dark Proteome
Akin to physics and modern cosmology wherein there is dark matter, in biology there exists the dark proteome. The exact composition of dark matter is not known, despite estimates that it comprises 27% of the total mass-energy density in the universe and 85% of matter in the universe according to 2022 research by Yuan, et al. published in Physics Reports. In the dark proteome, there may exist a potential treasure trove of undiscovered therapeutic targets for pharmaceutical drugs to combat diseases. The dark proteome includes proteins with unknown structures, understudied proteins, undetected proteins, proteins from non-canonical genes that deviate from standard genes, and intrinsically disordered proteins.
Intrinsically disordered proteins lack a well-defined structure. The lack of a stable structure makes IDPs next to impossible to study with traditional structural biology tools such as X-ray crystallography and cryo-electron microscopy.
The human proteome is the complete set of proteins expressed in the human body and is comprised of nearly 20,000 proteins according to estimates published in Nature Chemical Biology by Aebersold et al. An estimated 65% of the human proteome is comprised of structured proteins and the remaining 35% are intrinsically disordered proteins according to Bind Research.
Why Study Proteins?
Understanding proteins is essential for many fields such as neurodegenerative disease research, drug discovery, pharmaceutical drug design, biotechnology, synthetic biology, protein engineering, personalized medicine, and medical research. Many diseases and neurodegenerative disorders result from protein misfolding and protein aggregation. Protein aggregates can form when there is reduced protein stability or interrupted protein folding that can result in the build-up of unfolded proteins or misfolded proteins.
For example, the accumulation of misfolded tau proteins leads to neurofibrillary tangles called tau tangles, which is one of the signs of Alzheimer’s disease according to Alzheimer’s Disease Research by the BrightFocus Foundation. Misfolded alpha-synuclein protein is a biomarker for Parkinson’s disease according to a study published in The Lancet Neurology led by Professor Claudio Soto at the University of Texas McGovern Medical School at Houston. One of the signs of Huntington’s disease is the misfolding of the huntingtin protein according to a study published in Brain Research by Jiang, et. al. Protein misfolding that leads to protein aggregation is found in amyotrophic lateral sclerosis (ALS).
Going Beyond Nobel Prize-Winning AI
The successful application of AI deep learning to predict the 3D structure of proteins started in 2018, when Google DeepMind’s AlphaFold paved the way by achieving the highest accuracy in the 13th Critical Assessment of protein Structure Prediction (CASP) competition. AlphaFold is an AI deep learning model that predicts the 3D structures of proteins from amino acid sequences. Two years later in 2020, AlphaFold 2 achieved the highest accuracy in the CASP14 competition and is credited by CASP organizers for solving the protein folding problem, a 50-year-old grand challenge in biology.
The Nobel Prize in Chemistry 2024 was awarded to David Baker for computational protein design, and to Google DeepMind AI pioneers Demis Hassabis and John Jumper for protein structure prediction. This was pioneering work for predicting a single 3D structure for stable proteins.
Yet using AI deep learning to design intrinsically disordered proteins is inherently difficult due to the nature of machine learning. Deep neural networks “learn” features from massive amounts of training data rather than hard-coded explicit computer programming. Although the protein datasets that were used to train AlphaFold may contain IDP data, the majority mostly contained information on structured proteins captured via standard biology tools such as X-ray crystallography. Moreover, when it comes to intrinsically disordered regions (IDR), protein segments that lack a stable 3D structure, AlphaFold will most likely produce predictions with low confidence, according to the AlphaFold Protein Structure Database.
Even though AlphaFold 2 was trained on mainly stable proteins, it encodes “significant information on stability” according to a study published in Physical Review Letters by McBride and Tlusty. This opens the door for understanding unstable proteins.
An Innovative Approach
A seemingly obvious method to conduct this new study would be to identify databases consisting mostly of intrinsically disordered proteins and train a deep learning algorithm to predict protein structures. However, the Harvard University and Northwestern University researchers chose to take the road less traveled in their approach. What sets this new study apart is that the scientists created an AI model that uses gradient-based optimization and applies the fundamental laws of physics with realistic molecular dynamics simulations.
“Combining physics-based approaches with recent advances in differentiable programming holds promise for computational design and engineering for a wide variety of biomolecules and their functions,” reported the study authors.
This pioneering research uses applied physics and AI to shine a light on one of the important components of the dark proteome that may one day lead to new therapeutic drug targets and novel treatments to combat a wide range of diseases in the future.
Copyright © 2025 Cami Rosso All rights reserved.