Introduction
The evolution of complex systems is fundamentally governed by high-dimensional, nonlinear, and multiscale partial differential equations (PDEs), which arise in diverse fields such as geosciences, materials science, fluid dynamics, and biological systems1. Numerical methods such as finite element2 and meshless approaches[3](https://www.nature.com/articles/s41467-025-64624-…
Introduction
The evolution of complex systems is fundamentally governed by high-dimensional, nonlinear, and multiscale partial differential equations (PDEs), which arise in diverse fields such as geosciences, materials science, fluid dynamics, and biological systems1. Numerical methods such as finite element2 and meshless approaches3 have advanced physical modeling over the past decades, but traditional PDE-solving paradigms are increasingly limited in theory and application when confronted with high-dimensional parameter spaces, extreme computational costs4, the high trial-and-error cost of inverse problems5, and practical constraints such as missing boundary or initial conditions. These methods are also limited in their ability to exploit the underlying structural features of the system. Therefore, there is an urgent need for modeling frameworks that integrate data and physics and possess strong applicability to overcome these bottlenecks.
In recent years, machine learning methods, especially deep learning methods6, has provided new perspectives for scientific modeling. ML can automatically learn couplings and nonlinear mappings between variables from observational data, demonstrating great potential in system identification and equation discovery7, complex system modeling8. However, conventional end-to-end deep learning approaches require large amounts of data and are prone to overfitting, which limits their applicability1. To address these issues, physics-informed neural networks (PINNs)9 have emerged, embedding physical constraints into neural networks (NNs) via loss functions and thus integrating data-driven learning with physical priors. PINNs offer scalability10, flexibility and mesh-free characteristics11, making them powerful tools for solving PDEs and showing promise in inverse problems12. However, most existing PINN approaches focus on enforcing physical constraints through loss functions, while largely ignoring the explicit structural characteristics of the underlying physical systems. Furthermore, external loss functions only minimize the average inconsistency between model predictions and the physical mechanism13. As a result, for problems requiring strong physical consistency, one limitation of PINNs is that their predictions do not strictly adhere to underlying physical conservation laws14.
It is practical to enhance model performance by adding more domain-specific constraints15,16,17. Some methods impose constraints externally to the network, such as Hamiltonian neural networks18, Lagrangian neural networks19, and symplectic networks20, which maintain conservation laws by learning energy-like scalar quantities. Automatic symmetry discovery methods21 encourage networks to preserve symmetries by defining generalized symmetry loss functions. In contrast, another class of effective approaches embeds constraints directly within the internal structure of the network22, thereby ensuring strict satisfaction of these constraints. For example, Rao et al.23 encoded constraints within the network to optimize solutions for reaction-diffusion equations. The underlying principle is that adjusting a system’s internal structure can modify its output to satisfy required constraints24. Furthermore, if the network connectivity itself is designed to match the nature of the problem, the network can achieve superior performance in specialized domains24. For instance, Zhu et al.11 enforced relationships among trainable parameters to embed the space-time parity symmetry (ST-symmetry) of the Ablowitz-Ladik (A-L) equation into the network’s weight arrangement, enabling the simulation of nonlinear dynamic lattice solutions. However, these parameter-structured data-driven models rely on manual construction based on strong prior knowledge for specific problems, resulting in limited structural patterns and restricted applicability. Therefore, developing algorithms for automatic identification and extraction of network structures–reducing dependence on prior knowledge and manual design–would significantly enhance the applicability of structured network data-driven methods.
One effective approach for extracting network structures in NN techniques is regularization25,26. However, applying regularization in PINNs often fails to yield satisfactory results, as shown in Fig. 1. This is not only because PINNs are insensitive to regularization terms in the loss function27, but also because parameter regularization introduces gradient optimization directions that may conflict with the existing physical constraint regularization in PINNs9. Such excessive constraints can actually degrade the accuracy of PINNs1. Distillation learning28, as an effective dual-model training scheme, allows two models to be trained with different loss terms, enabling the student model to achieve accuracy comparable to the teacher model. This provides a means to integrate both physical and parameter regularization.
Fig. 1: The conflict between structure extraction methods and PINN.
After adding the regularization term, the loss of the PINN model increases.
Inspired by knowledge distillation, we propose a physics structure-informed NN discovery framework (Physics structure-informed neural network, pronounced as Psi-NN and abbreviated as Ψ-NN.) that enables automatic identification, extraction, and reconstruction of network architectures. In Ψ-NN, physical regularization (from governing equations) and parameter regularization are decoupled and applied separately to the teacher and student networks, overcoming the insensitivity to regularization and potential performance degradation observed in conventional PINNs. Physical information is efficiently transferred from the teacher to the student network via distillation, preserving essential physical constraints while expanding the representational capacity of the student model. An optimized structure extraction algorithm then automatically identifies parameter matrices with physical significance, while maximally retaining the feature space of the student network. Finally, a reinitialization mechanism is employed for network reconstruction, ensuring physical consistency in the network structure while endowing the model with applicability. By organically integrating distillation, structure extraction, and network reconstruction, Ψ-NN achieves physical consistency, interpretability, and high-accuracy predictions in structured NNs.
Results
Physics structure-informed neural network
To achieve an automatic, physics informed, and interpretable NN structure extraction mechanism, the Ψ-NN method consists of three components: (A) physics-informed distillation; (B) network parameter matrix extraction; and (C) structured network reconstruction, as illustrated in Fig. 2. The core idea of Ψ-NN is to embed physical information–such as spatiotemporal symmetries and conservation laws–directly into the network architecture. These constraints are encoded by the parameter matrices and reconstructed within the new network structure, thereby endowing the network with physical relevance. Further details are provided in Section 3. An ablation study of the Ψ-NN method is presented in Supplementary Information, validating the necessity of each component.
Fig. 2: The algorithm of the proposed Ψ-NN model.
The teacher network predicts the computational domain using the PINN approach, while the student network is supervised by the teacher’s output, forming a distillation learning process. During the training of the student network, regularization methods are used to naturally drive the parameters into clusters that can be identified by a clustering algorithm under the current physical constraints. Finally, based on the clustering results, parameter matrices related to physical properties are extracted, and ultimatel,y the network structure is reconstructed through structure-embedding (embed the unchanged relation matrix R into a new network).
In the numerical experiments, Ψ-NN achieves the goal of extracting network structures from data under partially known physical laws. The case studies demonstrate that Ψ-NN (A) accurately solves specific physical problems; (B) generalizes across different control parameters within the same problem; and (C) maintains generalizability of the reconstructed network structure across different physical problems. The detailed case settings are provided in Supplementary Information. The results show that Ψ-NN can effectively extract high-performance network structures in problems with partially known laws, yielding good fitting accuracy within the problem domain. The overall workflow is illustrated in Fig. 3, and the error results are summarized in Table 1. Control parameter transfer refers to the case where the form of the PDE is fixed during the inverse problem, but certain parameters (such as the viscosity coefficient in the Burgers equation shown in the figure) are varied.
Fig. 3: The pipeline of the Ψ-NN method.
Using Burgers equation as a sample. Three bold arrows represent the distillation-extraction-reconstruction process. a The whole field data was predicted by data-driven model. b The student network is trained with the structure generating method, using the teacher model’s output. c The structure extraction and reconstruction method is applied to find an optimal network structure for the Burgers equation modeling.
Extraction of network structure from PDEs
We selected several representative PDEs–the Laplace equation, Burgers equation, and Poisson equation–to employ baseline models with prior hard constraints, thereby better demonstrating the generalizability of Ψ-NN. These problems are widely used in physics. In the control group, we use PINNs with post-processing hard mapping13,22(PINN-post) as well as standard PINNs9. The former introduces additional enforced constraints by post-processing the network outputs, while the latter serves as a general NN solver for PDEs. The machine used for the case studies is an Intel 12400f CPU, RTX 4080 GPU. In all cases, the Adam optimizer29 is used for training. To ensure reproducibility, the random seed is fixed at 1234. The computational results are shown in Fig. 4.
Fig. 4: Numerical results.
a Laplace equation results. b Burgers equation results. Black dots in the exact field represent sampling points. c Poisson equation results. The first column on the left shows the mean square error (MSE) propagation, which is used to compare the optimization speed (the decrease in error within the same number of steps) and optimization accuracy (the final MSE value) of the representative models; the second column on the left shows the ground truth of the cases. The three columns on the right are the results of PINN, PINN-post, and Ψ-NN, respectively. The first row shows the model predictions, and the second row shows the absolute error between the model and the ground truth.
Laplace equation
Laplace’s equation has applications in various fields, including electric fields, heat conduction, and fluid statics30. With appropriate boundary and initial conditions, this equation can exhibit clear symmetry properties, providing more distinct structural features for the network. The Laplace equation is used to fully illustrate the implementation process of Ψ-NN and to demonstrate the interpretability of the Ψ-NN structure.
Consider the steady-state Laplace equation in two-dimensional coordinates ({{{\boldsymbol{x}}}}\in {{\mathbb{R}}}^{2}) with the following control PDE:
$${{{\mathcal{L}}}}:={\nabla }{2}u=0,\quad {{{\boldsymbol{x}}}}\in {[-1,1]}{2}$$
(1)
where x = (x1, x2). The boundary conditions of the problem is:
$${{{\mathcal{B}}}}:=u=\left{\begin{array}{ll}{-1}+3{x}_{2}{2},&{x}_{1}={-1} \hfill \ 1-3{x}_{2}{2},&{x}_{1}={1} \hfill \ {x}_{1}{3}-3{x}_{1},&{x}_{2}={-1} \hfill \ {x}_{1}{3}-3{x}_{1},&{x}_{2}=1 \hfill \end{array}\right.$$
(2)
other settings are provided in Supplementary Information.
A. Extracted structure
The Ψ-NN method enables clear extraction of network structures under the guidance of physical laws, whereas other existing methods can negatively impact network accuracy, as detailed in Supplementary Information. Figure 5a shows the evolution of the first hidden layer parameters during training. As the student network loss stabilizes, parameter convergence becomes more pronounced, resulting in extractable network structures. This convergence phenomenon is observed across different layers, and the final parameter clustering results under Ψ-NN are shown in Fig. 5b. The clustering of biases also converges and approaches zero, reducing inter-layer bias features and making the symmetry more evident.
Fig. 5: Burgers case results.
a The evolution of parameters under Ψ-NN method. The loss curve here serves as an indicator of residual stability rather than final accuracy. b Cluster centers of Ψ-NN in Laplace equation. The x-axis represents the absolute value of the weights, and the y-values are given randomly in order to better visualize the distribution. Negative values are shown as red dots, and positive values are shown as blue dots. The cluster distance is set to 0.1. The right column contains distributions of biases, left column contains distributions of weights. The first to fourth rows correspond to the clustering results of the network parameters for the first to third hidden layers and the output layer, respectively. c The structure of student NN after parameter replacement in Laplace equation.
Figure 5 c shows the network structure after replacing the original parameters with the cluster centers.
Since the second hidden layer structure involves reuse, sign reversal, and swapping, we take it as an example to describe in detail the formation of the relation matrix R2 during the “structure extraction“ process and its role in the “network reconstruction“ process of Ψ-NN. The subscript indicates the second layer. First, after replacing the trainable parameters of the student network with cluster centers, the parameter matrix c2 is:
$${c}_{2}=\left[\begin{array}{cc}-{c}_{2}{a}&-{c}_{2}{b}\ {c}_{2}{b}&{c}_{2}{a}\end{array}\right]$$
(3)
parameter matrix c2 is constructed as a diagonal matrix with different cluster center parameters arranged on the diagonal and denoted by superscripts as:
$${c}_{2}=\left[\begin{array}{cc}{c}_{2}{a}&0\ 0&{c}_{2}{b}\end{array}\right]$$
(4)
After flattening c**S2, selecting cluster centers using one-hot vectors, and incorporating sign relationships, the relation matrix R2 is represented as:
$${{{{\boldsymbol{R}}}}}_{2}=\left[\begin{array}{cc}-1&0\ 0&-1\ 0&1\ 1&0\end{array}\right]$$
(5)
In this matrix, rows with the same parameters are duplicated, rows with opposite signs are negated, and the swapping of rows 1,2 and 4,3 (which includes both row swapping and sign reversal in this case) represents the swapping relationship of parameters. The relation matrix R2 stores the relationships between network parameters, with each row representing a selected cluster center. Thus, in the reconstruction process of the new network, the trainable parameter matrix of the first hidden layer ({\hat{{{{\boldsymbol{\theta }}}}}}_{2}) is constrained by R2 as follows:
$${\hat{{{{\boldsymbol{\theta }}}}}}_{2}={{{{\boldsymbol{R}}}}}_{2}\cdot \left[\begin{array}{cc}{\hat{{{{\boldsymbol{\theta }}}}}}_{2}{a}&0\ 0&{\hat{{{{\boldsymbol{\theta }}}}}}_{2}{b}\end{array}\right]=\left[\begin{array}{cc}-{\hat{{{{\boldsymbol{\theta }}}}}}_{2}{a}&0\ 0&-{\hat{{{{\boldsymbol{\theta }}}}}}_{2}{b}\ 0&{\hat{{{{\boldsymbol{\theta }}}}}}_{2}{b}\ {\hat{{{{\boldsymbol{\theta }}}}}}_{2}{a}&0\end{array}\right]$$
(6)
After arranging the selected non-zero trainable parameters according to the node order, the following is obtained:
$${\hat{{{{\boldsymbol{\theta }}}}}}_{2}=\left[\begin{array}{cc}-{\hat{{{{\boldsymbol{\theta }}}}}}_{2}{a}&-{\hat{{{{\boldsymbol{\theta }}}}}}_{2}{b}\ {\hat{{{{\boldsymbol{\theta }}}}}}_{2}{b}&{\hat{{{{\boldsymbol{\theta }}}}}}_{2}{a}\end{array}\right]$$
(7)
converting to matrix form, W2 is as follows:
$${{{{\boldsymbol{W}}}}}_{2}=\left[\begin{array}{cc}-{{{{\boldsymbol{W}}}}}_{2}{a}&-{{{{\boldsymbol{W}}}}}_{2}{b}\ {{{{\boldsymbol{W}}}}}_{2}{b}&{{{{\boldsymbol{W}}}}}_{2}{a}\end{array}\right]$$
(8)
The other layers follow similarly. This structure is further represented using low-rank parameter matrices, with weight matrices being denoted as ({{{{\boldsymbol{W}}}}}_{i}^{j}), where i indicates the layer, and j is the label for the same submatrix. The bias is denoted as b. The architecture is expressed as follows::
$${{{{\boldsymbol{W}}}}}_{1}=\left[\begin{array}{cc}{{{{\boldsymbol{W}}}}}_{1}{a}&{{{{\boldsymbol{W}}}}}_{1}{b}\ -{{{{\boldsymbol{W}}}}}_{1}{a}&-{{{{\boldsymbol{W}}}}}_{1}{b}\end{array}\right],\quad {{{{\boldsymbol{b}}}}}_{1}=\left[\begin{array}{c}{{{{\boldsymbol{b}}}}}_{1}{a}\ -{{{{\boldsymbol{b}}}}}_{1}{a}\end{array}\right]$$
(9)
$${{{{\boldsymbol{W}}}}}_{2}=\left[\begin{array}{cc}-{{{{\boldsymbol{W}}}}}_{2}{a}&-{{{{\boldsymbol{W}}}}}_{2}{b}\ {{{{\boldsymbol{W}}}}}_{2}{b}&{{{{\boldsymbol{W}}}}}_{2}{a}\end{array}\right],\quad {{{{\boldsymbol{b}}}}}_{2}=\left[\begin{array}{c}{{{{\boldsymbol{b}}}}}_{2}{a}\ {{{{\boldsymbol{b}}}}}_{2}{a}\end{array}\right]$$
(10)
$${{{{\boldsymbol{W}}}}}_{3}=\left[\begin{array}{cc}{{{{\boldsymbol{W}}}}}_{3}{a}&{{{{\boldsymbol{W}}}}}_{3}{a}\end{array}\right],$$
(11)
This low-rank matrix is reconstructed through structure-embedding (embed the unchanged relation matrix R into a new network)., where the parameters of d (parameter submatrix dimension) nodes are represented by ({{{{\boldsymbol{W}}}}}_{1}{a}={[{w}_{11}{a},{w}_{12}{a},\cdots,{w}_{1d}{a}]}^{T}).
By expanding the expressions of the final hidden layer’s two sets of nodes, the relationship between the two-dimensional input x = (x1, x2) and the node outputs ua and ub can be obtained:
$${{{{\boldsymbol{u}}}}}_{a}({x}_{1},{x}_{2})=, tanh \left({{{{\boldsymbol{W}}}}}_{2}{a}\cdot \left(tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}+{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}+{{{{\boldsymbol{b}}}}}_{1}{a})\right.\right.\ \left.\left.+{{{{\boldsymbol{W}}}}}_{2}{b}\cdot (tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}-{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}+{{{{\boldsymbol{b}}}}}_{1}{a})+{{{{\boldsymbol{b}}}}}_{2}^{a})\right)\right)$$
(12)
$${{{{\boldsymbol{u}}}}}_{b}({x}_{1},{x}_{2})=, tanh \left({{{{\boldsymbol{W}}}}}_{2}{b}\cdot \left(tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}+{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}+{{{{\boldsymbol{b}}}}}_{1}{a})\right.\right.\ \left.\left.+{{{{\boldsymbol{W}}}}}_{2}{a}\cdot (tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}-{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}+{{{{\boldsymbol{b}}}}}_{1}{a})+{{{{\boldsymbol{b}}}}}_{2}^{a})\right)\right)$$
(13)
From the expressions of both, it follows that:
$${{{{\boldsymbol{u}}}}}_{a}({x}_{1},{x}_{2})={{{{\boldsymbol{u}}}}}_{b}({x}_{1},-{x}_{2})$$
(14)
finally, the output expression is:
$${{{\boldsymbol{u}}}}({x}_{1},{x}_{2})={{{{\boldsymbol{W}}}}}_{3}^{a}({{{{\boldsymbol{u}}}}}_{a}({x}_{1},{x}_{2})+{{{{\boldsymbol{u}}}}}_{b}({x}_{1},{x}_{2}))$$
(15)
where ({{{{\boldsymbol{W}}}}}_{3}^{a}) is the parameter matrix. This expression can also be written as:
$${{{\boldsymbol{u}}}}({x}_{1},{x}_{2})={{{{\boldsymbol{W}}}}}_{3}^{a}({{{{\boldsymbol{u}}}}}_{a}({x}_{1},{x}_{2})+{{{{\boldsymbol{u}}}}}_{a}({x}_{1},-{x}_{2}))$$
(16)
Therefore, this structured method implicitly embeds the symmetry contained in PINN-post
Previous structured PINN approaches11,31 require symmetry priors and rely on manually imposing group equivariance, targeting specific problems, while the network structure in Ψ-NN is automatically extracted from data and physical constraints, reducing reliance on manual design and strong prior knowledge.
B. Comparison between Ψ-NN and PINN
The results are shown in Fig. 4a, and the full-field L2 errors are summarized in Table 1. Compared to PINN, Ψ-NN reduces the number of iterations required to reach the same loss magnitude (1e−3) by approximately 50%, and decreases the final L2 error by about 95%. As illustrated in Fig. 6a and d, PINN does not exhibit consistent symmetry during training, whereas the structural constraints in Ψ-NN, which are consistent with the features of the PDE, enable a reduced search space and allow the solution to be found more quickly and accurately in the early stages of training.
Fig. 6: Prediction comparison by iterations.
a, d, PINN predictions; b, e, PINN-post predictions; c and f, Ψ-NN predictions. The optimization tendency of different models. In the figures of the first row, the red dashed line represents the true value of u as x2 varies when x1 = 0.8. Each green-blue gradient curve represents the network output at training steps from 1000 to 5000, with an interval of 300. In the second row of figures, the results at several training steps are plotted on a two-dimensional coordinate plane to illustrate the symmetry breaking in PINN during the iterative process, as well as the symmetry preservation in PINN-post and Ψ-NN.
C. Comparison between Ψ-NN and PINN-post
The PINN-post incorporates spatial symmetry into the network output layer through explicit constraints, resulting in outputs that better satisfy symmetric physical properties–reducing the full-field L2 error by approximately 65% compared to conventional PINN, especially within the computational domain. However, the converged MSE of PINN-post is higher than that of Ψ-NN, indicating that the minimum loss value in the PINN framework does not fully reflect the true accuracy of the network, but only the average fit to the available data and PDEs. The rate of loss reduction reflects the convergence speed: to reach a loss of 1e-3, PINN-post requires 5e3 fewer iterations than PINN.
Both Ψ-NN and PINN-post embed spatial symmetry into the network, but the key difference is that the reconstructed Ψ-NN architecture inherently contains this physical property, whereas PINN-post applies hard mapping as a post-processing step at the output layer. In terms of computation time: the computation time for PINN-post is 32.68 minutes, longer than Ψ-NN’s 29.87 minutes. Furthermore, Ψ-NN reaches a loss of 1e−4 with 1.5e4 fewer iterations, and its average convergence speed is about twice that of PINN-post. The L2 error is reduced from 1.159e−3 to 7.422e−5. As shown in Fig. 4a, the hard mapping constraint in PINN-post does not reduce the large computational errors near the boundaries. This suggests that, due to the rich implicit constraints in physical fields, manually embedding features via post-processing provides only a necessary but not sufficient constraint, and its applicability may be limited to a certain range. In contrast, Ψ-NN discovers network structures entirely based on observational data and PDEs, aiming to automatically embed all known information about the physical problem into the network structure, thereby reducing errors over a broader computational domain.
Burgers equation
The Burgers equation, as an important tool for describing nonlinear wave phenomena, is frequently used to study complex systems such as fluid dynamics and wave behavior32. We select the Burgers equation due to its pronounced shock formation and resulting antisymmetric properties, which serve to validate (a) the performance advantages of structured networks and (b) their applicability capability across a wide range of parameter variations.
In the inverse problem, the viscosity coefficient in the Burgers equation is replaced by an unknown parameter λ1, and the governing PDE becomes:
$${{{\mathcal{L}}}}:={u}_{t}-u{u}_{x}-{{{\rm{\lambda }}}}_{1}{u}_{xx}=0,\quad x\in [-1,1],\quad t\in [0,1]$$
(17)
A. Extraction result and performance comparison
In the Ψ-NN extraction process, taking the third hidden layer as an example, the parameter variations are shown in Supplementary Information (Fig. 6). The extracted low-rank parameter matrices are:
$${{{{\boldsymbol{W}}}}}_{1}=\left[\begin{array}{cc}{{{{\boldsymbol{W}}}}}_{1}{a}&{{{{\boldsymbol{W}}}}}_{1}{b}\ {{{{\boldsymbol{W}}}}}_{1}{a}&-{{{{\boldsymbol{W}}}}}_{1}{b}\end{array}\right],\quad {{{{\boldsymbol{b}}}}}_{1}=\left[\begin{array}{c}{{{{\boldsymbol{b}}}}}_{1}{a}\ -{{{{\boldsymbol{b}}}}}_{1}{a}\end{array}\right]$$
(18)
$${{{{\boldsymbol{W}}}}}_{2}=\left[\begin{array}{cc}{{{{\boldsymbol{W}}}}}_{2}{a}&-{{{{\boldsymbol{W}}}}}_{2}{a}\ {{{{\boldsymbol{W}}}}}_{2}{b}&{{{{\boldsymbol{W}}}}}_{2}{c}\ {{{{\boldsymbol{W}}}}}_{2}{c}&{{{{\boldsymbol{W}}}}}_{2}{b}\end{array}\right],\quad {{{{\boldsymbol{b}}}}}_{2}=\left[\begin{array}{c}{{{\boldsymbol{0}}}}\ {{{{\boldsymbol{b}}}}}_{2}{a}\ {{{{\boldsymbol{b}}}}}_{2}{a}\end{array}\right]$$
(19)
$${{{{\boldsymbol{W}}}}}_{3}=\left[\begin{array}{ccc}{{{{\boldsymbol{W}}}}}_{3}{a}&{{{{\boldsymbol{W}}}}}_{3}{b}&-{{{{\boldsymbol{W}}}}}_{3}^{b}\end{array}\right],\quad {{{{\boldsymbol{b}}}}}_{3}={{{\boldsymbol{0}}}}$$
(20)
Similarly, the relationship between the two-dimensional input x = (x1, x2) and the node outputs ua, ub, uc of the final hidden layer can be obtained:
$${{{{\boldsymbol{u}}}}}_{a}({x}_{1},{x}_{2})=, tanh \left({{{{\boldsymbol{W}}}}}_{2}{a}\cdot \left(tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}+{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}+{{{{\boldsymbol{b}}}}}_{1}{a})\right.\right.\ \left.\left.-{{{{\boldsymbol{W}}}}}_{2}{a}\cdot (tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}-{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}-{{{{\boldsymbol{b}}}}}_{1}{a}))\right)\right)$$
(21)
$${{{{\boldsymbol{u}}}}}_{b}({x}_{1},{x}_{2})=, tanh \left({{{{\boldsymbol{W}}}}}_{2}{b}\cdot \left(tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}+{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}+{{{{\boldsymbol{b}}}}}_{1}{a})\right.\right.\ \left.\left.+{{{{\boldsymbol{W}}}}}_{2}{c}\cdot (tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}-{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}-{{{{\boldsymbol{b}}}}}_{1}{a})+{{{{\boldsymbol{b}}}}}_{2}^{a})\right)\right)$$
(22)
$${{{{\boldsymbol{u}}}}}_{c}({x}_{1},{x}_{2})=, tanh \left({{{{\boldsymbol{W}}}}}_{2}{c}\cdot \left(tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}+{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}-{{{{\boldsymbol{b}}}}}_{1}{a})\right.\right.\ \left.\left.+{{{{\boldsymbol{W}}}}}_{2}{b}\cdot (tanh ({{{{\boldsymbol{W}}}}}_{1}{a}\cdot {x}_{1}-{{{{\boldsymbol{W}}}}}_{1}{b}\cdot {x}_{2}+{{{{\boldsymbol{b}}}}}_{1}{a})+{{{{\boldsymbol{b}}}}}_{2}^{a})\right)\right)$$
(23)
From the expressions of the three, it follows that:
$${\lim}_{{b}_{1}^{a}\to 0}{{{{\boldsymbol{u}}}}}_{a}({x}_{1},-{x}_{2})=-{{{{\boldsymbol{u}}}}}_{a}({x}_{1},{x}_{2})$$
(24)
$${{{{\boldsymbol{u}}}}}_{b}({x}_{1},{x}_{2})={{{{\boldsymbol{u}}}}}_{c}({x}_{1},-{x}_{2})$$
(25)
The expression of the final output u(x1, x2) is:
$${{{\boldsymbol{u}}}}({x}_{1},{x}_{2})={{{{\boldsymbol{W}}}}}_{3}{a}\cdot {{{{\boldsymbol{u}}}}}_{a}({x}_{1},{x}_{2})+{{{{\boldsymbol{W}}}}}_{3}{b}\cdot ({{{{\boldsymbol{u}}}}}_{b}({x}_{1},{x}_{2})-{{{{\boldsymbol{u}}}}}_{c}({x}_{1},{x}_{2}))$$
(26)
where ({{{{\boldsymbol{W}}}}}_{3}{a},{{{{\boldsymbol{W}}}}}_{3}{b}) are parameter matrices.
Therefore, the first half of u, ({{{{\boldsymbol{W}}}}}_{3}{a}\cdot {{{{\boldsymbol{u}}}}}_{a}({x}_{1},{x}_{2})), contains the symmetry of PINN-post under the condition ({\lim }_{{b}_{1}{a}\to 0}), while the second half, ({{{{\boldsymbol{W}}}}}_{3}^{b}\cdot ({{{{\boldsymbol{u}}}}}_{b}({x}_{1},{x}_{2})+{{{{\boldsymbol{u}}}}}_{c}({x}_{1},{x}_{2}))), directly contains the symmetry of PINN-post. Additionally, the properties of ua in the second hidden layer also indicate that the emergence of symmetry does not strictly depend on the number of network layers.
The trend of the loss function is shown in Fig. 4b. The reconstructed structured network of Ψ-NN exhibits significantly faster iteration speed during training compared to PINN and PINN-post, achieving a minimum loss function accuracy of 1e-5, which is lower than the other two models.
The full-field L2 errors are summarized in Table 1. Both PINN-post and Ψ-NN achieve lower errors than PINN, demonstrating the effectiveness of embedding equation features into the network structure for improving fitting accuracy. As shown in Fig. 4b, after shock formation at t > 0.4, both PINN and PINN-post exhibit large error distributions, whereas Ψ-NN uniformly reduces errors on both sides of the shock. Furthermore, PINN-post, due to its post-processing symmetry, enforces a symmetric error distribution across the shock but does not reduce the actual error. The structure discovery capability of Ψ-NN provides a more precise and matching feature space, resulting in the lowest error.
For the inverse problem, in addition to reconstructing the entire field, another key task is to estimate the unknown parameter λ1 in Eq. (17), with the true value λ1 = 0.01/π. The final results are given in Table 2, where Ψ-NN achieves the closest value to the ground truth. The evolution of this parameter during training is shown in Fig. 7. Since λ1 is included in the trainable parameter vector of the NN, these curves can be interpreted as optimization trajectories6; a shorter path indicates a clearer search direction and faster convergence. Thus, the structural features discovered by Ψ-NN reduce the output space and make it easier to find the correct solution.
Fig. 7: Prediction comparison under different control parameters.
a ({\nu }_{1}=\frac{0.01}{\pi }). b ({\nu }_{2}=\frac{0.04}{\pi }). c ({\nu }_{3}=\frac{0.08}{\pi }). Burgers problem parameters comparison. The shorter the path, the clearer the search direction in the parameter space and the faster the convergence. The structural features discovered by Ψ-NN reduce the output space and make it easier to find the correct solution. Moreover, the network structure found in ({\nu }_{1}=\frac{0.01}{\pi }) can successfully generalize to ({\nu }_{2}=\frac{0.04}{\pi }) and ({\nu }_{3}=\frac{0.08}{\pi }), yielding good results.
B. Ψ-NN structure performance in different parameter cases
The viscosity term ν to be solved in the inverse problem is represented by the unknown parameter λ1. This allows us to validate the applicability capability of the structure reconstructed for a specific problem under different parameters. We conducted experiments using the same Ψ-NN-reconstructed structure without modifying other configurations, specifically at ν2 = 0.04/π and ν3 = 0.08/π. The computational results are shown in Fig. 7, where the Ψ-NN method maintains shorter paths across different parameters. Shorter path across different parameters indicates that Ψ-NN has a more efficient optimization process in parameter space6. The final prediction results are summarized in Table 2, with values closest to the ground truth.
Poisson equation
Poisson’s equation plays a crucial role in various computations, including heat conduction, electromagnetism, and gravitational fields33. Here, we select a Poisson problem in a unit square domain with a smooth source term f(x1, x2) that contains four increasing frequencies. This choice allows us to demonstrate the applicability and performance of the Ψ-NN method across different parameters. High-frequency physical systems often exhibit inherent symmetric structures34. To address this, we employ Ψ-NN to discover and leverage the symmetric patterns present in the problem, effectively alleviating the challenges associated with high-frequency characteristics and enhancing modeling efficiency and accuracy. Specifically, Poisson’s equation satisfies the constraint:
$${{{\mathcal{L}}}}:={\nabla }^{2}u=f({x}_{1},{x}_{2}),\quad {x}_{1},{x}_{2}\in [0,1]$$
(27)
where the source term is given by:
$$f({x}_{1},{x}_{2})=\frac{1}{4} {\sum}_{k=1}{4}(-1)(k+1)2{k}{2}sin(k\pi {x}_{1})sin(k\pi {x}_{2})$$
(28)
The source term exhibits permutation equivariance, which can be reformulated with the equivariant group (H={{\mathbb{Z}}}_{2}\times {{\mathbb{Z}}}_{2}) formed by the 2-order cyclic group ({{\mathbb{Z}}}_{2}:{0,1}). The 2 transformations can be stated as ∀ (h1, h2) ∈ H:
$$\left{\begin{array}{l}{{{{\mathcal{T}}}}}_{{X}_{f}}{{h}_{1},{h}_{2}}({x}_{1},{x}_{2})=({(-1)}{{h}_{1}}{x}_{2},{(-1)}{{h}_{2}}{x}_{1}) \hfill \ {{{{\mathcal{T}}}}}_{{Y}_{f}}{{h}_{1},{h}_{2}}(f)={(-1)}^{{h}_{1}+{h}_{2}}f \hfill \end{array}\right.$$
(29)
where ({{{{\mathcal{T}}}}}_{X}{h},{{{{\mathcal{T}}}}}_{Y}{h}) are the group actions on domain X and codomain Y, respectively.
Similarly, we construct a low-frequency solution to the PDEs (27) with the source term f = 0, ensuring the permutation property of the solution. The permutation property can be simplified using the permutation equivariant group ({H}_{e}={{\mathbb{Z}}}_{2}) as ∀ h ∈ H**e:
$$\left{\begin{array}{l}{{{{\mathcal{T}}}}}_{{X}_{lf}}{h}({x}_{1},{x}_{2})=(-{x}_{2},-{x}_{1}) \hfill \ {{{{\mathcal{T}}}}}_{{Y}_{lf}}{h}(u)={(-1)}^{h}u \hfill \end{array}\right.$$
(30)
The constructed solution is (u={x}_{1}{2}-{x}_{2}{2}). The low-rank parameter matrix extracted for this low-frequency function is:
$${{{{\boldsymbol{W}}}}}_{1}=\left[\begin{array}{cc}{{{{\boldsymbol{W}}}}}_{1}{a}&{{{\boldsymbol{0}}}}\ {{{{\boldsymbol{0}}}}}_{1}&{{{{\boldsymbol{W}}}}}_{1}{a}\end{array}\right],\quad {{{{\boldsymbol{b}}}}}_{1}=\left[\begin{array}{c}{{{{\boldsymbol{b}}}}}_{1}{a}\ -{{{{\boldsymbol{b}}}}}_{1}{a}\end{array}\right]$$
(31)
$${{{{\boldsymbol{W}}}}}_{2}=\left[\begin{array}{cc}{{{{\boldsymbol{W}}}}}_{2}{a}&{{{{\boldsymbol{W}}}}}_{2}{b}\ {{{{\boldsymbol{W}}\