. Bottom: an MCTS module was adopted to refine the generated formulas, using a context-free grammar pool that includes atomic formulas and the end-to-end generated formula. RMSE, root mean squared error. …
The overall PhyE2E framework. Top: the training dataset was augmented with a large-scale synthetic dataset generated by an LLM. MLP, multilayer perceptron. Middle: a variable-interaction technique was integrated to decompose the original symbolic regression problem into simpler subproblems, referred to as D&C. An end-to-end model was trained to predict the target formula using observed data points and prior physical knowledge (referred to as ‘physical priors’). Bottom: an MCTS module was adopted to refine the generated formulas, using a context-free grammar pool that includes atomic formulas and the end-to-end generated formula. RMSE, root mean squared error. Credit: Ying et al. (Nature Machine Intelligence, 2025).
Artificial intelligence (AI) systems, particularly artificial neural networks, have proved to be highly promising tools for uncovering patterns in large amounts of data that would otherwise be difficult to detect. Over the past decade, AI tools have been applied in a wide range of settings and fields.
Among its many possible applications, AI systems could be used to discover physical relationships and symbolic expressions (i.e., mathematical formulas) describing these relationships.
To uncover these formulas, physicists currently need to extensively analyze raw data, thus automating this process could be highly advantageous.
Researchers at Tsinghua University, Peking University and other institutes in China have developed an AI framework that can automatically derive symbolic physical representations from raw data. This new model, called PhyE2E, was introduced in a paper published in Nature Machine Intelligence.
“Our goal was to push AI beyond curve-fitting and toward human-understandable discovery: returning compact, unit-consistent equations that scientists can read, test, and build on,” Yuan Zhou, co-senior author of the paper, told Phys.org.
“We focused first on space physics, where long, well-curated observational records let us check whether the learned equations really match nature. The approach itself is general, and we expect it to extend to other sciences.”
A model that symbolically represents physics data
PhyE2E, the new AI framework introduced by Zhou and his colleagues, was trained on physical data and mathematical equations. During training, the model learned what plausible physics-related formulae “look like,” by fine-tuning widely established physics equations and then producing others by synthesizing various unit-consistent variants.
“PhyE2E uses a transformer to translate data directly into a symbolic expression and its units,” explained Zhou.
“It applies a divide-and-conquer step that inspects second-order derivatives of a lightweight ‘oracle’ network to split a hard problem into simpler sub-formulas and performs a brief MCTS/GP refinement to tidy constants and structure. The result is an equation that is compact, interpretable, and dimensionally consistent.”
As part of their recent study, the researchers tested their framework on both synthetic data generated by a large language model (LLM) and real astrophysical data collected by NASA.
Ultimately, they were able to derive formulas describing physical relationships in data related to five real space-physics scenarios. Notably, the formulas it derived matched those derived by human physicists or appeared to represent data even better.
For instance, when analyzing data published by NASA in 1993, the model attained an improved formula explaining solar cycles mathematically. In addition, it was able to effectively represent the relationships between solar radiation, temperature and magnetic fields.
A promising tool for scientific discovery
Essentially, the new AI model developed by this research team learns to break down complex physics problems into simpler parts. Drawing from existing and well-established equations, it can then generate new formulas that effectively describe the relationship between different variables.
“While it’s trivial to write a long expression that interpolates the data, and tempting to favor very short ones, neither guarantees physical meaning—many candidate formulas even violate dimensional (unit) consistency,” said Zhou.
“We leverage recent advances in large language models to learn a prior over known, unit-consistent equations, then fine-tune it so the system proposes compact, physically plausible expressions that carry genuine insight. We view this as a first step toward abstracting and extending scientific experience to enable automated discovery.”
PhyE2E could soon be used to analyze other experimental and astrophysical data, which might yield formulas that better describe specific physical phenomena or interactions. In the future, it could also be adapted and applied to other disciplines, potentially contributing to scientific discovery in a variety of fields.
“We’re now extending the framework to calculus-aware operators (e.g., derivatives/integrals for PDE-style laws), strengthening robustness to noisier laboratory data,” added Zhou.
“More broadly, the central aim of our research is to advance neuro-symbolic methodology so that predictions from deep neural networks are interpretable. At the same time, we hope that integrating explainability as a design principle can enhance an AI system’s capacity to uncover more accurate and reliable scientific laws.”
Written for you by our author Ingrid Fadelli, edited by Sadie Harley, and fact-checked and reviewed by Robert Egan—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a donation (especially monthly). You’ll get an ad-free account as a thank-you.
More information: Jie Ying et al, A neural symbolic model for space physics, Nature Machine Intelligence (2025). DOI: 10.1038/s42256-025-01126-3.
© 2025 Science X Network
Citation: New AI framework can uncover space physics equations in raw data (2025, November 10) retrieved 10 November 2025 from https://phys.org/news/2025-11-ai-framework-uncover-space-physics.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.