Main
For centuries, scientific discovery has been increasingly driven by data. Symbolic regression (SR) stands at the forefront of this movement, aiming to automatically distill interpretable mathematical expressions from observational data without presupposing specific functional forms1. This capability has catalyzed scientific advances across multiple domains, including astronomical modeling[2](https://www.nature.com/articles/s43588-025-00904-8#ref-CR2 “Wadekar, D. et al. Augmenting astrophysical scaling relations with machine learning: application to reducing the Sunyaev–Zeldovich flux–mass scatter. Proc. Na…
Main
For centuries, scientific discovery has been increasingly driven by data. Symbolic regression (SR) stands at the forefront of this movement, aiming to automatically distill interpretable mathematical expressions from observational data without presupposing specific functional forms1. This capability has catalyzed scientific advances across multiple domains, including astronomical modeling2, materials science3 and physical law discovery4, among other applications5,6. The fundamental challenge in SR arises from the combinatorial explosion of possible expressions, rendering exhaustive search. While conventional regression techniques excel at parameter estimation for predetermined models7, they offer limited utility when the underlying mathematical structure is unknown.
Evolutionary computation approaches address this structural discovery problem through heuristic search mechanisms. Genetic programming (GP) and related algorithms1,8 explore expression spaces by applying genetic operations to tree-based representations, while Bayesian methods9 employ probabilistic sampling to identify promising expressions. These techniques have shown success in uncovering governing equations from data, for example, for dynamical systems10. However, they face critical issues in practical applications, such as poor scalability, sensitivity to hyperparameter configurations, and propensity of generating complex expressions that overfit data.
A notable progress lies in the introduction of sparse regression for equation discovery, for example, the sparse identification of nonlinear dynamics (SINDy) approach11. By formulating the discovery as a sparse linear regression problem over a predefined function library, such a method achieves remarkable efficiency in identifying parsimonious models. The family of SINDy approaches have shown remarkable success in data-driven discovery of both ordinary and partial differential equations12,13,14,15,16. Nevertheless, their effectiveness is contingent on careful library design that balances expressiveness with tractability. More fundamentally, the limitation to linear combinations of basis functions inherently restricts the complexity of discoverable relationships, particularly when variable interactions are nonlinear and absent from the library.
Deep learning architectures have recently emerged as powerful alternatives to SR, for example, recurrent neural network with risk-seeking policy gradients17, grammar-based variational autoencoders18, neural network-based graph modularity19 and transformer models pre-trained on mathematical expressions20,21,22. Noteworthy, symbolic neural networks represent a particularly relevant specialization, incorporating mathematical operators as activation functions to enable end-to-end equation discovery23. A persistent challenge across these neural approaches is their dependence on empirical thresholding for expression simplification, which introduces subjectivity and often fails to yield meaningful equations when the data are limited and noisy.
Very recently, Monte Carlo tree search (MCTS)24,25, which gained acclaim for powering the decision-making algorithms in AlphaZero26, has shown great potential in navigating the expansive search space inherent in SR27. This method uses stochastic simulations to meticulously evaluate the merit of each node in the search tree and has empowered several SR techniques28,29,30,31. However, the conventional application of MCTS maps each node to a unique expression, which tends to impede the rapid recovery of accurate symbolic representations. This narrow mapping may limit the strategy’s ability to efficiently parse through the complex space of potential expressions, thus presenting a bottleneck in the quest for swiftly uncovering underlying mathematical equations.
SR involves evaluating a large number of complex symbolic expressions composed of various operators, variables and constants. The operators and variables are discrete, while the value of the coefficients is continuous. The NP-hardness of a typical SR process, noted by many scholars17,19,32, has been formally established[33](https://www.nature.com/articles/s43588-025-00904-8#ref-CR33 “Virgolin, M. & Pissis, S. P. Symbolic regression is NP-hard. Trans. Mach. Learn. Res. https://openreview.net/forum?id=LTiaPxqe2e
(2022).“). This characteristic requires the algorithm to traverse various possible combinations, resulting in a huge and even infinite search space and thus facing the problem of combinatorial explosion. Unfortunately, all the existing SR methods evaluate each candidate expression independently, leading to substantially low computational efficiency. Although some studies have considered caching evaluated expressions to prevent recomputation30, they do not reuse these results as subtree values for evaluating deeper expressions. Consequently, the majority of these methods either rely on meta-heuristic search strategies, narrow down the search space based on specific assumptions, or incorporate pre-trained models to discover relatively complex expressions, which make the algorithms prone to producing specious results (for example, local optima). Therefore, the efficiency of candidate expression evaluation is of paramount importance. By enhancing the efficiency of candidate expression evaluation, it becomes possible to design new SR algorithms that are less reliant on particular optimization methods, while increasing the likelihood of directly finding the global optima in the grand search space. Thus, we can improve the SR accuracy (in particular, the symbolic recovery rate) and, meanwhile, drastically reduce the computational time.
To this end, we propose parallel symbolic enumeration (PSE) to automatically distill symbolic expressions from limited data. Such a model is capable of efficiently evaluating potential expressions, thereby facilitating the exploration of hundreds of millions of candidate expressions simultaneously in parallel within a mere few seconds. In particular, we propose a parallel symbolic regression network (PSRN) as the cornerstone search engine, which (1) automatically captures common subtrees of different math expression trees for shared evaluation that avoids redundant computations, and (2) capitalizes on a graphics processing unit (GPU)-based parallel search that results in a notable performance boost. It is notable that, in a standard SR process for equation discovery, many candidate expressions share common subtrees, which leads to repeatedly redundant evaluations and, consequently, superfluous computation. To address this issue, we propose a strategy to automatically reuse common subtrees for shared evaluation, effectively circumventing redundant computation. We further execute PSE on a GPU to perform a large-scale evaluation of candidate expressions in parallel, thereby augmenting the efficiency of the evaluation process. Notably, the synergy of these two techniques could yield up to four orders of magnitude efficiency improvement in the context of expression evaluation. In addition, to expedite the convergence and enhance the capacity of PSRN for exploration and identification of more intricate expressions, we amalgamate it with a token generator strategy (for example, MCTS or GP) that identifies a set of admissible base expressions as tokenized input. The effectiveness and efficiency of the proposed PSE model have been demonstrated on a variety of benchmark and lab test datasets. The results show that PSE surpasses multiple existing baseline methods, achieving higher symbolic recovery rates and efficiency.
Results
We introduce a general-purpose SR method, called PSE, to discover mathematical expressions from data. The core of this framework is the PSRN module that can evaluate hundreds of millions of candidate expressions in parallel by identifying and reusing common subtree computations, dramatically accelerating the search process. To explore complex expressions, PSRN is integrated into an iterative loop with a token generator (for example, GP), which discovers admissible sub-expressions as building blocks. The overall architecture and workflow of the PSE model are illustrated in Fig. 1. A detailed description of the methodology, including the symbol layer design (Fig. 2), token generation strategies and coefficient optimization, is provided in Methods.
Fig. 1: Overview of the proposed PSE model.
a, Schematic of PSRN regressor for discovering optimal expressions from data. The base expression set ({s}_{{\mathcal{T}}}) from terminal nodes, along with optionally sampled token constant values, is fed into the PSRN (denoted as ψPSRN) for forward computation. After evaluating the errors of large-scale candidate expressions, ψPSRN provides the optimal (or top k) expressions denoted as (\hat{{\mathcal{F}}}). Subsequently, the least squares method is employed for the identification and fine-tuning of the coefficients, resulting in adjusted expressions ({\hat{{\mathcal{F}}}}{* }), which are then utilized for updating the Pareto front as computing the complexity and reward. b, Forward computation in PSRN. The base expression set and sampled constants, along with the corresponding data tensor X, are fed into the network. The network comprises multiple symbol layers (for example, typically three, but only two are shown for simplicity). Each layer provides all possible subtree values resulting from one-operator computations among all input expressions. This process can be efficiently parallelized on a GPU for rapid computation. Upon completion of PSRN’s forward computation, a myriad of candidate expressions (\hat{{\mathcal{F}}}) (for example, up to hundreds of millions) corresponding to the base expression set ({s}_{{\mathcal{T}}}), along with their associated error tensors on the given data, are generated. Then, the expression with the minimal error (or top-k expressions) will be selected. c, The token generator π (for example, MCTS, GP) creates promising subtrees from a token set s**T. These subtrees are evaluated by the PSRN regressor ({\mathscr{R}}) through large-scale parallel expression evaluation. ({\mathcal{R}}) fits the input data, producing a reward r and the best expressions ({\hat{{\mathcal{F}}}}{* }), which are fed back to π. This feedback loop drives continuous improvement in generating promising subtrees. The process combines the ability of π to explore the token space with the capacity of ({\mathscr{R}}) to assess expression quality based on data fit. This iterative system efficiently discovers high-quality expressions by leveraging both heuristic search and large-scale parallel evaluation. The overall iterative PSE process is detailed in Supplementary Algorithm 2, with specific token generators π described in Supplementary Algorithms 3–5. d, Schematic of common subtree identification. Expressions sharing common subtrees, for example, (\sin ({x}_{1}+{x}_{2})) and (\exp ({x}_{1}+{x}_{2})), leverage identical subtree computation outcomes. This approach circumvents extensive redundant calculations, thereby increasing the efficiency of SR. e, Schematic of duplicate removal mask (DR mask). We designed the DR mask layer to mask the output tensor of the penultimate layer in PSRN, thereby excluding subtree expressions that are symbolically equivalent. This greatly reduces PSRN’s usage of GPU memory. f, Estimation and fine-tuning of constant coefficients. The least squares method is used to fine-tune the coefficients of preliminary expressions discovered by PSRN using the complete dataset.
Fig. 2: The detailed process of PSRN forward propagation for obtaining the optimal expression.
Step 1: the input data X and the base expression set s, together with optional token constants sampled from a predefined distribution p(c), are fed into PSRN for forward propagation. Based on the designated operator categories (for example, {I, +, ×, sin, exp}, where I is the identity operator), a large number of distinct subtree values h, for example, (\hat{{\mathscr{F}}}({X})) are rapidly calculated layer by layer. Step 2: the MSE is computed between the final layer’s output and the broadcast target tensor y, resulting in the loss tensor. Step 3: the position of the minimum value in the loss tensor is selected. Step 4: starting from the optimal position, a recursive symbolic deducing is performed using the offset tensor Θ generated when building the network, ultimately yielding the optimal expression. If necessary, the coefficients of this expression will be adjusted in post-processing. The algorithmic procedure for PSRN evaluation and symbolic reconstruction is presented in Supplementary Algorithm 1.
To demonstrate the effectiveness and efficiency of the proposed PSE model, we conduct a comprehensive evaluation across a diverse range of challenging tasks. These include standard SR benchmarks, the data-driven discovery of governing equations for chaotic dynamical systems and the modeling of real-world physical phenomena from experimental data. It is noteworthy that, unless specified otherwise in the ablation studies, all experiments presented hereafter utilize the GP variant of our token generator to effectively explore the vast search space. For real-world datasets such as the electro-mechanical positioning system and the turbulent friction experiments, we assess the plausibility of discovered equations using the mean squared error (MSE) and parsimony due to the absence of ground-truth expressions. For other benchmark experiments, we employ the symbolic recovery rate as the key measure of accuracy. Although the generated expression, further simplified by SymPy, is not guaranteed to be the best-fitting solution matching the ground truth, especially for noisy datasets, this metric remains widely adopted in the SR community. It enables direct and standardized comparisons with a substantial body of prior work on established benchmark problem sets that contain ground-truth expressions. The collective results presented as follows consistently highlight the superiority of PSE in both symbolic recovery accuracy and computational performance over existing baseline methods.
SR benchmarks
We first demonstrate the efficacy of PSE on recovering specified mathematical formulas given multiple benchmark problem sets (including Nguyen34, Nguyen-c35, R36, Livermore32 and Feynman19, as described in Supplementary Note 2.1), commonly used to evaluate the performance of SR algorithms. Each SR puzzle consists of a ground-truth equation, a set of available math operators and a corresponding dataset. These benchmark data sets contains various math expressions, for example, x3 + x2 + x (Nguyen-1), 3.39x3 + 2.12x2 + 1.78x (Nguyen-1c), (x + 1)3/(x2 − x + 1) (R-1), 1/3 + x + sin(x2) (Livermore-1), ({x}_{1}{4}-{x}_{1}{3}+{x}_{1}^{2}-{x}_{2}) (Livermore-5) and x1x2x3 log (x5/x4) (Feynman-9), which are listed in detail in Supplementary Tables 1 and 2. Our objective is to uncover the Pareto front of optimal mathematical expressions that balance the equation complexity and error. The performance of PSE is compared with eight baseline methods, for example, symbolic physics learner (SPL)27, neural-guided genetic programming (NGGP)32, deep generative symbolic regression (DGSR)22, PySR[37](https://www.nature.com/articles/s43588-025-00904-8#ref-CR37 “Cranmer, M. Interpretable machine learning for science with PySR and SymbolicRegression.jl. Preprint at https://arxiv.org/abs/2305.01582
(2023).“), Bayesian machine scientist (BMS)9, a unified framework for deep symbolic regression (uDSR)38, transformer-based planning for symbolic regression (TPSR)30 and Operon39. For the Nguyen-c dataset, each model is run for at least 20 independent trials with different random seeds, while for all other puzzles, 100 distinct random seeds are utilized. We mark the successful case if the ground-truth equation lies in the discovered Pareto front set.
The SR results of different models, in terms of the symbolic recovery rate and computational time, are depicted in Fig. 3. It can be seen that the proposed PSE method outperforms the baseline methods for all the benchmark problem sets in terms of recovery rate, meanwhile maintaining the high efficiency (for example, expending the minimal computation time) that achieves up to two orders of magnitude speedup. The detailed results for each SR problem set are listed in Supplementary Tables 10–15. An intriguing finding is that PSE achieves an impressive symbolic recovery rate of 100% on the R benchmark expressions, while the baseline models almost fail (for example, recovery rate <2%). We attribute this to the mathematical nature of the R benchmark expressions, which are all rational fractions like (x + 1)3/(x2 − x + 1) (R-1). Confined to the sampling interval x ∈ [−1, 1], the properties of the R expressions bear a high resemblance to polynomial functions, resulting in intractable local minima that essentially lead to the failure of NGGP and DGSR. This issue can be alleviated given a larger interval, for example, x ∈ [−10, 10], as illustrated in the R* dataset (Supplementary Table 12). Notably, the performance of DGSR based on pre-trained language models stems from the prevalence of polynomial expressions in the pre-training corpora, while NGGP collapses on account of its limited search capacity in an enormously large search space. In contrast, owing to the direct and parallel evaluation of multi-million expression structures, PSE has the capability of accurately and efficiently recovering complex expressions.
Fig. 3: Performance of various models on SR benchmarks.
a, Nguyen problem set. b, Nguyen-c problem set. c, R problem set. d, R* problem set. e, Livermore problem set. f, Feynman problem set. We employ two evaluation metrics, namely, symbolic recovery rate (left) and average runtime (right), to assess the performance of each algorithm. Our PSE approach achieves the highest recovery rate across various benchmark problem sets, meanwhile expending the minimal computation time, compared with the baseline methods (for example, SPL27, NGGP32, DGSR22, PySR[37](https://www.nature.com/articles/s43588-025-00904-8#ref-CR37 “Cranmer, M. Interpretable machine learning for science with PySR and SymbolicRegression.jl. Preprint at https://arxiv.org/abs/2305.01582
(2023).“), BMS9, uDSR38, TPSR30 and Operon39). This highlights the substantial superiority of PSE. Note that the star marks the mean value over each problem set. As BMS9 uses priors from Wikipedia, it performs reasonably well on subsets of expressions like Feynman, but has very low recovery rates for other expression subsets as it is not suited for finding artificially designed expressions without constants. The Feynman dataset shown here follows the definition in DGSR’s paper, which is a subset of SRBench’s Feynman problems under data-limited conditions (fewer than 50 data points). For evaluation on the complete SRBench dataset, see ‘SRBench evaluation and performance’. We strictly maintain consistent runtime budgets. The variations in the algorithms’ runtime occur because different algorithms trigger their respective early stopping conditions (for example, reaching an MSE below a very small threshold, or discovering symbolically equivalent expressions, and so on). The central line represents the median, the box spans the 25th to 75th percentiles, the whiskers extend to the minimum and maximum values within 1.5 times the interquartile range, and the star indicates the mean recovery rate for each problem set.
Discovery of chaotic dynamics
Nonlinear dynamics is ubiquitous in nature and typically governed by a set of differential equations. Distilling such governing equations from limited observed data plays a crucial role in better understanding the fundamental mechanism of dynamics. Here we test the proposed PSE model to discover a series of multi-dimensional autonomous chaotic dynamics (for example, Lorenz attractor40 and its variants41). The synthetic datasets of these chaotic dynamical systems are described in Supplementary Note 2.2. It is noted that we only sample the noisy trajectories while determining the velocity states via smoothed numerical differentiation. We compare PSE with eight pivotal baseline models (namely, BMS9, PySR[37](https://www.nature.com/articles/s43588-025-00904-8#ref-CR37 “Cranmer, M. Interpretable machine learning for science with PySR and SymbolicRegression.jl. Preprint at https://arxiv.org/abs/2305.01582
(2023).“), NGGP32, DGSR22, wAIC16, TPSR30, uDSR38 and Operon39) and run each model on 50 different random seeds to calculate the average recovery rate for each dataset. For weighted Akaike Information Criterion (wAIC), we employed two distinct basis function configurations: one utilizing polynomial basis functions (wAIC1), and another combining polynomial basis functions with the same unary operator basis used by other algorithms (wAIC2). Considering the noise effect, the criterion for successful equation recovery in this experiment is defined as follows: the discovered Pareto front covers the structure of the ground-truth equation (allowing for a constant bias term). As the dynamics of each system is governed by multiple coupled differential equations (for example, three or four as shown in Supplementary Tables 3–6), the discovery is conducted independently to compute the average recovery rate for each equation, among which the minimum rate is taken to represent each model’s capability.
Our main focus herein is to investigate whether SR methods can successfully recover the underlying differential equations given a specific set of token operators under the limit of a short period of computational time (for example, a few minutes). In our experiments, we set the candidate binary operators as +, −, × and ÷, while the candidate unary operators include sin, cos, exp, log, tanh, cosh, abs and sign. Figure 4a,b depicts the results of discovering the closed-form governing equations for 16 chaotic dynamical systems. The experiments demonstrate that under four different levels of Gaussian noise (1%, 2%, 5% and 10% of the data’s standard deviation), the proposed PSE approach can achieve a much higher symbolic recovery rate (Fig. 4b), enabling to identify more accurately the underlying governing equations, even under noise effect, to better describe the chaotic behaviors (Fig. 4a). This substantiates the capability and efficiency of our method on data-driven discovery of governing laws for more complex chaotic dynamical systems beyond the Lorenz attractor. More detailed results are listed in Supplementary Tables 16–19.
Fig. 4: Data-driven discovery of nonlinear chaotic dynamics by different models including PSE, BMS, PySR, NGGP, DGSR, wAIC, uDSR, TPSR and Operon.
A large number of operators (for example, +, −, ×, ÷, sin, cos, exp, cosh, tanh, abs, sign and so on) are used to simulate the situation in which scientists explore unknown systems in the real world with less prior knowledge about operators. Given the same budget of computational time, PSE has a higher probability of finding the true governing equations among a nearly infinite number of possible expression structures. Each model’s runtime is capped at around 10 minutes by limiting the number of iterations. a, Examples of predicted trajectories (for example, solid lines) by different models compared with the ground truth (for example, dashed lines). b, Average recovery rates of SR algorithms on 16 nonlinear chaotic dynamics datasets. Each bar with the same color represents the recovery rates under a typical case of 2% Gaussian noise. More results under different noise levels (1%, 5% and 10% of the data’s standard deviation) can be found in Extended Data Fig. 1 and Supplementary Tables 16–19. The error bars represent the 95% confidence interval of the mean recovery rate for each system. The result is averaged across 50 independent runs.
Electro-mechanical positioning system
Real-world data, replete with intricate noise and nonlinearity, may hinder the efficacy of SR algorithms. To further validate the capability of our PSE model in uncovering the governing equations for real-world dynamical systems (for example, mechanical devices), we test its performance on a set of lab experimental data of an electro-mechanical positioning system (EMPS)42, as shown in Fig. 5a,b. The EMPS set-up is a standard configuration of a drive system used for prismatic joints in robots or machine tools. Finding the governing equation of such a system is crucial for designing better controllers and optimizing system parameters. The dataset was bifurcated into two even parts, serving as the training and testing sets, respectively. The reference governing equation is given by (M\ddot{q}=-{F}_{\text{v}}\dot{q}-{F}_{\text{c}}{\rm{sign}}(\dot{q})+\tau -c) (ref. 42), where q, (\dot{q}) and (\ddot{q}) represent joint position, velocity and acceleration. Here, τ is the joint torque/force; c, M, Fv and Fc are all constant parameters in the equation. Notably, we leveraged the a priori knowledge that the governing equation of such a system follows the Newton’s second law, with a general form (\ddot{q}=f(\dot{q},q,\tau )). Our objective is to uncover the closed form of f based on a predefined set of token operators. In EMPS, there exists friction that dissipates the system energy. On the basis of this prior knowledge, we include the sign operator to model such a mechanism. Hence, the candidate math operators we use to test the SR algorithms read {+, −, ×, ÷, sin, cos, exp, log, cosh, tanh, abs, sign}.
Fig. 5: Discovering the underlying physical laws with experimental data.
a, The set-up of the EMPS experiment. b, Collected displacement and input force data. c, The prediction performance along with a typical governing equation discovered by PSE. Herein, RMSE stands for root mean square error. d, The prediction performance along with a typical governing equation discovered by PySR. e, The prediction performance along with a typical governing equation discovered by BMS. f, The prediction performance along with a typical governing equation discovered by wAIC. Additional results for other methods on the EMPS experiment can be found in Supplementary Fig. 3. g, The prediction error versus model complexity for different SR methods on the EMPS dataset, averaged over 20 independent runs. The circle size indicates MSE uncertainty (interquartile range). h, The transformed (or collapsed) Nikuradse’s dataset and discovered equations of turbulent friction by different SR methods. The legend shows the median reward models (equation (6)), among 20 trials, obtained by PSE (red), PySR (green)[37](https://www.nature.com/articles/s43588-025-00904-8#ref-CR37 “Cranmer, M. Interpretable machine learning for science with PySR and SymbolicRegression.jl. Preprint at https://arxiv.org/abs/2305.01582
(2023).“), NGGP (blue)32, DGSR (sky blue)22, uDSR (orange)38, TPSR (brown)30, BMS (purple)9 and Operon (pink)39. i, The fitting performance of each SR method on the original Nikuradse’s data. j, The predicted MSE versus model complexity for different SR methods, averaged over 20 independent runs. The circle size indicates MSE uncertainty (interquartile range). Notably, our PSE method excels at fitting turbulent friction data rapidly (for example, within only 1.5 minutes), meanwhile discovering a more parsimonious and accurate equation.
We compare our PSE model with eight pivotal baseline models, namely, PySR[37](https://www.nature.com/articles/s43588-025-00904-8#ref-CR37 “Cranmer, M. Interpretable machine learning for science with PySR and SymbolicRegression.jl. Preprint at https://arxiv.org/abs/2305.01582
(2023).“), NGGP32, DGSR22, BMS9, uDSR38, TPSR30, wAIC16 and Operon39. We execute each SR model 20 trials on the training dataset to ascertain the Pareto fronts. Subsequently, we select the discovered equation from the candidate expression set of each Pareto front based on the test dataset that shows the highest reward value delineated in equation (6). The reward is designed to balance the prediction error and the complexity of the discovered equation, which is crucial to deriving the governing equation that is not only as consistent with the data as possible but also parsimonious and interpretable. Finally, we select among 20 trials the discovered equation with the median reward value to represent each SR model’s average performance. The results demonstrate that our PSE model achieves the best performance in successfully discovering the underlying governing equation (Fig. 5c–f and Supplementary Fig. 3). Our model is capable of uncovering the correct governing equation while maintaining low prediction error (Fig. 5g).
Governing equation of turbulent friction
Uncovering the intrinsic relationship between fluid dynamics and frictional resi