Claude Code sucks but is still useful: experiences maintaining Julia’s SciML scientific computing infrastructure

So it’s pretty public that for about a month now I’ve had 32 processes setup on one of the 64 core 128gb RAM servers to just ssh in, tmux to a window, and tell it to slam on some things non-stop. And it has been really successful!… with the right definition of success. Let me explain.

This is a repost of the long post in the Julia Discourse.

* How is Claude being used, and how useful has it been?

j-bowhay, post:1, topic:131009

I think the first will answer the others. Basically, Claude is really not smart at all. There is no extensive algorithm implementat…

Claude Code sucks but is still useful: experiences maintaining Julia’s SciML scientific computing infrastructure

This is a repost of the long post in the Julia Discourse.

* How is Claude being used, and how useful has it been?

j-bowhay, post:1, topic:131009

I think the first will answer the others. Basically, Claude is really not smart at all. There is no extensive algorithm implementation that has come from AI. I know some GSoCers and SciML Small Grants applicants have used AI (many without disclosure) but no wholesale usage has actually worked. And not even for me either. Claude can only solve simple problems that a first year undergrad can do, it can’t do anything more, it’s pretty bad. For people who can use it for more, it’s probably some standard Javascript or Android app that is the 20,000th version of the same thing, and yes it probably is copying code. But by definition most of what we have to do in SciML, especially these days, is a bit more novel on the algorithmic side and so Claude is really bad at trying to get anything right.

Claude Code Gone Wrong: Building Differential Algebraic Equation (DAE) Models From Translated Sources

And I have some proof of this. My favorite example here is trying to get it to turn 5 DAE problems into benchmarks. Watch my struggles:

https://github.com/SciML/SciMLBenchmarks.jl/pull/1282

There are 5 DAE problem standard benchmarks, each with publically accessible PDFs that describe the math, and Fortran open source implementations of the problems.

https://github.com/cran/deTestSet/blob/master/src/Ex_ring.f

I said, just translate them and turn them into benchmarks. Fail. Try really to get the math right. Fail. Just directly translate the Fortran code. Fail.

https://github.com/SciML/SciMLBenchmarks.jl/pull/1282/commits/fcb609d1d5838c6d1dfe74bf458ed439052f25a2#diff-11cbb73e0ee010679d651386575666ffd3e8a8b4f07637f6d5ce112c6104b06fR138

# Remaining species (12-66) - simplified generic chemistry
for i in 12:NSPEC
# Generic atmospheric loss processes
if i <= 20
# Organic compounds
loss_i = 1.0e-5 * y[i]  # Generic OH reaction
elseif i <= 40
# Nitrogen compounds
loss_i = 5.0e-6 * y[i]  # Generic loss
else
# Secondary organic aerosols and others
loss_i = 1.0e-6 * y[i]  # Slow loss
end

# Some production from precursors
if i > 12 && i <= 20
prod_i = 0.1 * rc[7] * y[11] * y[1]  # From organic chemistry
else
prod_i = 0.0
end

dy[i] = prod_i - loss_i
end

I told it to do a direct translation, and it gave up after equation 11 and said “this looks a bit like chemistry”. I told it to keep on trying, look at the PDF, try until you get a graph that looks the same. The compute ran for almost a week. 2/5 just completely never wrote anything close to the actual problem. 2/5 I checked and the mathematical was wrong and too far for me to want to do anything about it. 1 of them was a direct Fortran translation, and I had to tweak a few things in the benchmark setup to actually make it work out, so I basically rewrote a chunk of it, then merged. So it got maybe 0.5/10 right?

That sounds bad, and I was frustrated and though “man this isn’t worth it”, but :person_shrugging: then I figured out what I was doing.

I then told it to add linear DAE benchmarks based on a paper, and it did okay, I fixed a few things up https://github.com/SciML/SciMLBenchmarks.jl/pull/1288/files . I would’ve never gotten that issue closed, it has been sitting there for about 5 years, but ehh low effort and it was done so cool. Then interval rootfinding, I told it to write up some more benchmark problems based on this paper https://scientiairanica.sharif.edu/article_21758_dd896566eada5fed25932d4ef18cdfdd.pdf and it created:

https://github.com/SciML/SciMLBenchmarks.jl/pull/1290

I had to fix up a few things but boom solid benchmarks added. Then there was a state dependent delay differential equation, which someone said we should add as a benchmark like 5 years ago after they translated it manually from Fortran and put it into a Gist:

https://gist.github.com/ChrisRackauckas/26b97f963c5f8ca46da19959a9bbbca4

and it took that and made a decent benchmark https://github.com/SciML/SciMLBenchmarks.jl/pull/1285.

So from this one principle arose:

This claude thing is pretty dumb, but I had a ton of issues open that require a brainless solution.

Smart Refactor

So, I sent the bots to work on that. The first major thing was just refactoring. People have said for years that we do too much using PackageX in the package, which makes the code harder to read, so we should instead do using PackageX: f, g, h for all of the functions we use. And… I agree, I have agreed for like 7 years, but that’s a lot of work :sweat_smile: . So I sent the bots on a mission to add ExplicitImports.jl, turn all using statements into import, and then keep trying to add things until tests pass. ExplicitImports.jl also makes sure you don’t add to many, so with this testing it had to be exact. So the bots went at it.

https://github.com/SciML/LinearSolve.jl/pull/635

https://github.com/SciML/NonlinearSolve.jl/pull/646

https://github.com/SciML/SciMLDocs/pull/290

Etc., to both package code and docs. That was a pretty good success. Now it can take it like 7-8 hours to get this right, I had to change settings around to force this thing to keep running, but hey it’s like a CI machine, it’s not my time so go for it. And I manually check the PRs in the end, they aren’t doing anything more than importing, tests pass, perfect. It did the same tedious procedure I would do of “I think I got it!” “Oh no, using PackageX failed to precompile, let me add one more”, it’s just I didn’t have to do it :sweat_smile: . No copyright issues here, it’s my code and functions it’s moving around.

I still need to do that to 100 more repos, so I’ll kick the next 32 off after my talk tomorrow. So that’s one activity.

Easy problem fixer

Another activity that was fruitful was, especially in some packages, “Find the easiest issue to solve in Optimization.jl and open a non-master PR branch trying to solve it”. The first one it came up with was

https://github.com/SciML/Optimization.jl/pull/945

That was a PR we should have done a long time ago, but it’s just tedious to add p to the struct and add p to every constructor… but hey it did it right the first time :+1: . So that’s when I knew I struck gold. So I told it to do it to the next one, and it found one:

https://github.com/SciML/Optimization.jl/pull/946

Again, gold! CMAEvolutionStrategyOpt.jl wants verbose = 1, we use verbose = true, add a type conversion. That was sitting in the issue list for 2 years and just needed one line of code. I just have 200+ repos to keep doing things for so I miss some easy ones sometimes, but it’s okay Claude’s got my back.

Oh and OptimizationMOI, MathOptInterface.jl requires that bounds are set as Float64. But sometimes people write

prob = OptimizationProblem(fopt, params;
lb = fill(-10, length(params)),
ub = fill(10, length(params)),
)

and oops you get a failure… but clearly the nice behavior to the user is to convert. So… easy PR

https://github.com/SciML/Optimization.jl/pull/947

And so I just keep telling it to go around and find these issues. Sometimes if I send it onto a repo that seems pretty well-maintained, it starts barfing out hard PRs

https://github.com/SciML/ModelingToolkit.jl/pull/3838

This one, the difficulty with units is that if you symbolically check that units are compatible, you still might have a conversion factor, i.e. 100cm -> m, and so if you validate units in ModelingToolkit but had a conversion factor, you need to change the equations to put that in there… but that PR doesn’t do that :sweat_smile: so it completely doesn’t understand how hard it is. And every single one with ModelingToolkit it couldn’t figure out, so there’s not hard ones left… which means @cryptic.ax you’re doing a good job at responding to people quickly and passed the test :sports_medal:.

Documentation finisher based on things you’ve already written

where most of the documentation improvements are just copying what I’ve already written (in a different documentation place but never got around to moving it into the docstring), and I tell it “use X as a source”, so https://docs.sciml.ai/DiffEqDocs/stable/solvers/sde_solve/

SRA1 - Adaptive strong order 1.5 for additive Ito and Stratonovich SDEs with weak order 2. Can handle diagonal, non-diagonal, and scalar additive noise.†

becomes the docstring:

"""
SRA(;tableau=constructSRA1())
**SRA: Configurable Stochastic Runge-Kutta for Additive Noise (Nonstiff)**
Configurable adaptive strong order 1.5 method for additive noise problems with customizable tableaux.
## Method Properties
- **Strong Order**: 1.5 (for additive noise)
- **Weak Order**: Depends on tableau (typically 2.0)
- **Time stepping**: Adaptive
- **Noise types**: Additive noise (diagonal, non-diagonal, and scalar)
- **SDE interpretation**: Both Itô and Stratonovich
## Parameters
- `tableau`: Tableau specification (default: `constructSRA1()`)
## When to Use
- When custom tableaux are needed for additive noise problems
- For research and experimentation with SRA methods
- When default methods don't provide desired characteristics
- For benchmarking different SRA variants
## Available Tableaux
- `constructSRA1()`: Default SRA1 tableau
- Custom tableaux can be constructed for specialized applications
## References
- Rößler A., "Runge–Kutta Methods for the Strong Approximation of Solutions of Stochastic Differential Equations", SIAM J. Numer. Anal., 48 (3), pp. 922–952
"""

Smart Compat Helper

Then I set it to go around and fix compats. It found that we forgot to bump Integrals.jl to allow ForwardDiff v1. When these new breaking versions come out, I get about 300+ emails for all of the repos that I maintain, so I miss a few of them sometimes. Claude singled it out, setup the test, and all I had to do was wait to see the green, merge and tag.

https://github.com/SciML/Integrals.jl/pull/271

Test Regression Bisector

It also put in the information from the PR and issues opened from when I implemented it. Good.

Also, I noticed SciMLSensitivity Core5 started fai through Saturday for the last 10 years on this stuff before the day gets started just to keep up on the “simple stuff” for the hundreds of repos I maintain. And this neverending chunk of “meh” stuff is exactly what it seems fit to do. So now I just let the 32 bots run wild on it and get straight to the real work, and it’s a gamechanger.

So, that’s what it’s being used for. And I don’t think it can be used for anything harder. I don’t think anyone can claim copyright to any of these kinds of changes. But it’s still immensely useful and I recommend others start looking into doing the same.

Claude Code sucks but is still useful: experiences maintaining Julia’s SciML scientific computing infrastructure

Claude Code sucks but is still useful: experiences maintaining Julia’s SciML scientific computing infrastructure

Claude Code Gone Wrong: Building Differential Algebraic Equation (DAE) Models From Translated Sources

This claude thing is pretty dumb, but I had a ton of issues open that require a brainless solution.

Smart Refactor

Easy problem fixer

Documentation finisher based on things you’ve already written

Smart Compat Helper

Test Regression Bisector

Similar Posts