arXiv:2602.02972v1 Announce Type: new Abstract: The efficiency of Gr"obner basis computation, the standard engine for solving systems of polynomial equations, depends on the choice of monomial ordering. Despite a near-continuum of possible monomial orders, most implementations rely on static heuristics such as GrevLex, guided primarily by expert intuition. We address this gap by casting the selection of monomial orderings as a reinforcement learning problem over the space of admissible orderings. Our approach leverages domain-informed reward signals that accurately reflect the computational cost of Gr"obner basis computations and admits efficient Monte Carlo estimation. Experiments on benchmark problems from systems biology and computer vision show that the resulting learned policies con…
arXiv:2602.02972v1 Announce Type: new Abstract: The efficiency of Gr"obner basis computation, the standard engine for solving systems of polynomial equations, depends on the choice of monomial ordering. Despite a near-continuum of possible monomial orders, most implementations rely on static heuristics such as GrevLex, guided primarily by expert intuition. We address this gap by casting the selection of monomial orderings as a reinforcement learning problem over the space of admissible orderings. Our approach leverages domain-informed reward signals that accurately reflect the computational cost of Gr"obner basis computations and admits efficient Monte Carlo estimation. Experiments on benchmark problems from systems biology and computer vision show that the resulting learned policies consistently outperform standard heuristics, yielding substantial reductions in computational cost. Moreover, we find that these policies resist distillation into simple interpretable models, providing empirical evidence that deep reinforcement learning allows the agents to exploit non-linear geometric structure beyond the scope of traditional heuristics.