Formally Verifying Advent of Code Using Dijkstra's Program Construction

I’m doing Advent of Code again this year, and part 1 of today’s problem reminded me immediately of some of the problems I’m doing in my Program Construction module at UCD. In the class, we cover the foundations of Edsger W. Dijsktra’s Structured Programming. It teaches you how to formally verify your program by finding the pre-conditions and post-conditions, then deriving and proving theorems that build up towards the final program.

It’s a very different style of thinking about programming than most people are used to, but I’ll try my best to explain the notation and…

It’s a very different style of thinking about programming than most people are used to, but I’ll try my best to explain the notation and the logic I’m using.

We start out by writing our post-condition. This is what we want to be true once our program has finished running — in other words, it’s what we want to calculate. We’re going to use this funky-looking syntax called Quantified Notation.

As an intro, here’s a basic quantified expression:

<+ i:0≤i<N:f.i><+;i : 0 \leq i < N: f.i>

f.if.i is the syntax we’ll use for accessing the ithi^{th} element of some array ff. This is simply shorthand for this longer expression:

f.0+f.1+f.2+f.3+⋯+f.(N−1)f.0 + f.1 + f.2 + f.3 + \dotsb + f.(N - 1)

For those of you more familiar with functional programming, you’ll find that this is just a reduction over a list. ff is the list in question and ++ is the operation we want to use to combine each element with the accumulator. However, program construction is designed around an imperative language, and so we need an index variable ii to keep track of our position in the array. We also have to specify the range of ii, which is from 00 to the length of the array.

With that exposition out of the way, here’s the postcondition for our AoC problem (for a single bank of batteries). You should go read the problem statement over on the AoC website for this to make sense :)

<↑i,j:0≤i<j<N:10⋅f.i+f.j><\uparrow i, j : 0 \leq i < j < N : 10 \cdot f.i + f.j>

This is a quantification over two variables: ii and jj. It’s essentially the same as before, but when we reduce using the max operator (↑\uparrow), we have to reduce over 10⋅f.i+f.j10 \cdot f.i + f.j for every possible combination of ii and jj, such that jj is greater than ii. And yeah, that’s exactly what we want: we want the two batteries to turn on such that the concatenation of their joltages is the maximum possible. Note that we’re assuming here that we’ve already parsed each bank of batteries into an array of integers ff.

Now we get to build our domain model. It’s the collection of definitions and theorems that we can use later on when constructing our program.

Model

(0) C.n≡<↑i,j:0≤i<j<n:10⋅f.i+f.j>,2≤n≤NC.n \equiv <\uparrow i, j : 0 \leq i < j < n : 10 \cdot f.i + f.j> \quad,\quad 2 \leq n \leq N

We extract our post-condition into a reusable function defined over all nn from 22 — the minimum valid length of an array for this calculation — to NN, the actual length of our array.

Now observe:

C.2=(0)<↑i,j:0≤i<j<2:10⋅f.i+f.j>=split off j = 1<↑i,j:0≤i<j<1:10⋅f.i+f.j> ↑ <↑i:0≤i<1:10⋅f.i+f.1>=empty range, single pointid↑ ↑ 10⋅f.0+f.1=arithmetic10⋅f.0+f.1\quad C.2 \ =\quad\quad\quad {\text{(0)}} \ \quad <\uparrow i, j : 0 \leq i < j < 2 : 10 \cdot f.i + f.j> \ =\quad\quad\quad {\text{split off j = 1}} \ \quad <\uparrow i, j : 0 \leq i < j < 1 : 10 \cdot f.i + f.j> ; \uparrow ; <\uparrow i : 0 \leq i \lt 1 : 10 \cdot f.i + f.1>\ =\quad\quad\quad {\text{empty range, single point}} \ \quad id_\uparrow ; \uparrow ; 10 \cdot f.0 + f.1 \ =\quad\quad\quad {\text{arithmetic}} \ \quad 10 \cdot f.0 + f.1 \

So now we know that:

(1) C.2=f.0+f.1C.2 = f.0 + f.1

Now that we have our “base case”, or the initial value of our accumulator, we can look into using associativity to find C.(n+1)C.(n+1) given C.nC.n.

Observe again:

C.(n+1)=(0)<↑i,j:0≤i<j<n+1:10⋅f.i+f.j>=split off j = n<↑i,j:0≤i<j<n:10⋅f.i+f.j> ↑ <↑i:0≤i<n:10⋅f.i+f.n>=(0)C.n ↑ <↑i:0≤i<n:10⋅f.i+f.n>=(3)C.n ↑ D.n\quad C.(n+1) \ =\quad\quad\quad {\text{(0)}} \ \quad <\uparrow i, j : 0 \leq i < j < n + 1 : 10 \cdot f.i + f.j> \ =\quad\quad\quad {\text{split off j = n}} \ \quad <\uparrow i, j : 0 \leq i < j < n : 10 \cdot f.i + f.j> ; \uparrow ; <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n>\ =\quad\quad\quad {\text{(0)}} \ \quad C.n ; \uparrow ; <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n>\ =\quad\quad\quad {\text{(3)}} \ \quad C.n ; \uparrow ; D.n\

This gives us

(2) C.(n+1)≡C.n+D.n,2≤n<NC.(n+1) \equiv C.n + D.n \quad,\quad 2 \leq n \lt N

and

(3) D.n≡<↑i:0≤i<n:10⋅f.i+f.n>,2≤n<ND.n \equiv <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n> \quad,\quad 2 \leq n \lt N

Did you see how I sneakily used (3) up there before defining it properly? Let’s actually simplify D.nD.n.

Observe once more:

D.n=(3)<↑i:0≤i<n:10⋅f.i+f.n>=+ distributes over ↑<↑i:0≤i<n:10⋅f.i>+f.n=∗ distributes over ↑10 ⋅<↑i:0≤i<n:f.i>+f.n=(6)10 ⋅E.n+f.n\quad D.n \ =\quad\quad\quad {\text{(3)}} \ \quad <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n> \ =\quad\quad\quad {+ : \text{distributes over} : \uparrow} \ \quad <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i> + f.n \ =\quad\quad\quad {* : \text{distributes over} : \uparrow} \ \quad 10 : \cdot <\uparrow i : 0 \leq i \lt n : f.i> + f.n \ =\quad\quad\quad {\text{(6)}} \ \quad 10 : \cdot E.n + f.n \

So:

(3) D.n≡10⋅E.n+f.n,2≤n<ND.n \equiv 10 \cdot E.n + f.n \quad,\quad 2 \leq n \lt N

And:

(6) E.n≡ <↑i:0≤i<n:f.i>,2≤n<NE.n \equiv : <\uparrow i : 0 \leq i \lt n : f.i> \quad,\quad 2 \leq n \lt N

What happened to (4) and (5), you ask? Well, we have to derive the base case and associative case for D.nD.n just like we did for C.nC.n. We’ll need our E.nE.n theorems first, though.

Observe:

E.2=(6)<↑i:0≤i<2:f.i>=split off i = 1<↑i:0≤i<1:f.i> ↑f.1=single pointf.0 ↑f.1\quad E.2 \ =\quad\quad\quad {\text{(6)}} \ \quad <\uparrow i : 0 \leq i \lt 2 : f.i> \ =\quad\quad\quad {\text{split off i = 1}} \ \quad <\uparrow i : 0 \leq i \lt 1 : f.i> : \uparrow f.1 \ =\quad\quad\quad {\text{single point}} \ \quad f.0 : \uparrow f.1 \

And so:

(7) E.2=f.0 ↑f.1E.2 = f.0 : \uparrow f.1

And once more:

E.(n+1)=(6)<↑i:0≤i<n+1:f.i>=split off i = n<↑i:0≤i<1:f.i> ↑f.n=(6)E.n ↑f.n\quad E.(n+1) \ =\quad\quad\quad {\text{(6)}} \ \quad <\uparrow i : 0 \leq i \lt n + 1 : f.i> \ =\quad\quad\quad {\text{split off i = n}} \ \quad <\uparrow i : 0 \leq i \lt 1 : f.i> : \uparrow f.n \ =\quad\quad\quad {\text{(6)}} \ \quad E.n : \uparrow f.n \

Which gives us:

(8) E.(n+1)≡E.n ↑f.n,2≤n<N−1E.(n + 1) \equiv E.n : \uparrow f.n \quad,\quad 2 \leq n \lt N - 1

And now back to D.nD.n. You know the drill, observe:

D.2=(3)10⋅E.2+f.2=(7)10⋅(f.0 ↑f.1)+f.2\quad D.2 \ =\quad\quad\quad {\text{(3)}} \ \quad 10 \cdot E.2 + f.2 \ =\quad\quad\quad {\text{(7)}} \ \quad 10 \cdot (f.0 : \uparrow f.1) + f.2 \

And similarly:

D.(n+1)=(3)10⋅E.(n+1)+f.(n+1)=(8)10⋅(E.n ↑f.n)+f.(n+1)\quad D.(n + 1) \ =\quad\quad\quad {\text{(3)}} \ \quad 10 \cdot E.(n + 1) + f.(n + 1) \ =\quad\quad\quad {\text{(8)}} \ \quad 10 \cdot (E.n : \uparrow f.n) + f.(n + 1) \

So the last pieces of our model are:

(4) D.2=10⋅(f.0 ↑f.1)+f.2D.2 = 10 \cdot (f.0 : \uparrow f.1) + f.2

(5) D.(n+1)≡10⋅(E.n ↑f.n)+f.(n+1),2≤n<N−1D.(n + 1) \equiv 10 \cdot (E.n : \uparrow f.n) + f.(n + 1) \quad,\quad 2 \leq n \lt N - 1

Lovely. We now have everything we need to go and construct our program loop.

Program Loop

Let’s first rewrite our postcondition in terms of the theorems from our model.

C.NC.N

We can then strengthen this postcondition by rewriting it like so:

C.n∧n=NC.n \wedge n = N

Strengthen is a funny name, but that’s all there is to it. We’re pulling NN out of C.NC.N. Why? Because every loop has 3 fundamental things:

Invariants: these are the things that are always true during the program
Variant: a measure of how much work there is left to do
Guard: a boolean check that lets you know when to break out of the loop; that is, when your variant has bottomed out because there is no more work left

By splitting our post-condition into two parts, we can use the first part as our invariant and the second as our loop guard.

Invariants

Let’s say we have a variable rr, and rr is always equal to C.nC.n, for whatever value nn is at the moment. This is an invariant.

r=C.nr = C.n

However, the definition of C.(n+1)C.(n+1) depends on D.nD.n, which depends on E.nE.n.

Let’s get some variables involved for those too:

d=D.nd = D.n
e=E.ne = E.n

So our invariants are:

(P0) r=C.n∧d=D.n∧e=E.nr = C.n \wedge d = D.n \wedge e = E.n
(P1) 2≤n≤N−12 \leq n \leq N - 1

Establish the invariants

We can “establish” our invariants by initialising our variables to values such that the equalities we defined as invariants are true. We know the values of C.2C.2, D.2D.2, and E.2E.2 from when we derived them. So, let’s set n=2n = 2 and initialise rr, dd, and ee to those values.

n:=2n := 2
r:=C.2:=10∗f.0+f.1r := C.2 := 10 * f.0 + f.1
d:=D.2:=10∗(f.0 ↑f.1)+f.2d := D.2 := 10 * (f.0 :\uparrow f.1) + f.2
e:=E.2:=(f.0 ↑f.1)e := E.2 := (f.0 :\uparrow f.1)

Loop Guard

Remember how I said we could use the n=Nn =N part as our guard? I was lying, just a little bit. If you were paying attention during those derivations, you’ll have noticed that C.nC.n is defined for 0≤n≤N0 \leq n \leq N, but D.nD.n and E.nE.n are defined for 0≤n<N0 \leq n \lt N, and D.(n+1)D.(n + 1) and E.(n+1)E. (n + 1) are only defined for 0≤n<N−10 \leq n \lt N - 1.

This is because you can’t calculate C.(N+1)C.(N+1). There simply aren’t any more elements in the array, and so C.(n+1)C.(n+1) is not defined at NN. Since the definition of D.nD.n comes from C.(n+1)C.(n+1), we don’t define D.nD.n at NN either. And since D.nD.n is not defined at NN, D.(n+1)D.(n+1) cannot be defined at N−1N-1. Similarly for E.nE.n and E.(n+1)E.(n+1).

And so, our loop actually can’t go all the way up to n=Nn = N. It can only go up to N−1N - 1 (exclusive). That is, we will break out of the loop once nn becomes N−1N - 1.

If you paid attention to the invariants bit, you’ll realise that this means we’ll only have r=C.(N−1)r = C.(N-1) and d=D.(N−1)d = D.(N - 1) after the loop. We want C.NC.N though, but this isn’t a problem. From (2), we know how to find C.(n+1)C.(n+1) from C.nC.n and D.nD.n.

Anyway, this is our loop guard:

n≠N−1n \neq N - 1

Variant

And our corresponding variant is:

VF=(N−1)−nVF = (N - 1) - n

When VFVF becomes 0, we exit the loop.

Calculating the loop body

We’re starting off with n=2n = 2 and we want n=N−1n = N - 1 at the end of the loop. A logical way to get this to happen is to increment nn by 1 in each iteration. The important thing, however, is that we have to make sure our invariants remain true after incrementing nn.

We don’t know what to set rr, dd, and ee to, but we can find out. Let’s set them to some temporary variables K1,K2,K3K_1, K_2, K_3 and solve for them.

(n,r,d,e:=n+1,K1,K2,K3).(P0)=text substitutionK1=C.(n+1)∧K2=D.(n+1)∧K3=E.(n+1)=(2), (5), (8)K1=C.n ↑D.n∧K2=10⋅(E.n ↑f.n)+f.(n+1)∧K3=E.n ↑f.n=(P0)K1=r ↑d∧K2=10⋅(e ↑f.n)+f.(n+1)∧K3=e ↑f.n\quad (n, r, d, e := n + 1, K_1, K_2, K_3).(P0) \ =\quad\quad\quad\text{text substitution} \ \quad K_1 = C.(n+1) \wedge K_2 = D.(n+1) \wedge K_3 = E.(n+1) \ =\quad\quad\quad\text{(2), (5), (8)} \ \quad K_1 = C.n :\uparrow D.n \wedge K_2 = 10 \cdot (E.n :\uparrow f.n) + f.(n+1) \wedge K_3 = E.n :\uparrow f.n \ =\quad\quad\quad\text{(P0)} \ \quad K_1 = r :\uparrow d \wedge K_2 = 10 \cdot (e :\uparrow f.n) + f.(n+1) \wedge K_3 = e :\uparrow f.n \

Now we know exactly how to update each variable within the loop. Home stretch now!

Writing our program

// establish invariants
{n, r, d, e := 2, 10 * f.0 + f.1, 10 * max(f.0, f.1) + f.2, max(f.0, f.1)}

// loop body
; do n != N - 1 ->
n, r, d, e := n + 1, max(r, d), 10 * max(e, f.n) + f.(n + 1), max(e, f.n)
od

// calculate C.N from C.(N - 1) and D.(N - 1)
; r := max(r, d)

// postcondition achieved!
{r = C.N}

The above program is written in Guarded Command Language, another invention of Dijkstra’s. Dijkstra was adamant that it never be implemented for a real computer:

“Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs.” ~ EWD1036

So let’s translate it to a real programming language. I’ve been solving AoC in Gleam so far. This presents a little challenge, as Gleam is a functional programming language, while GCL is imperative, so we can’t do a 1-to-1 translation. However, with a little bit of cleverness, this is what we get.

pub fn pt_1(input: List(List(Int))) {
input
|> list.map(fn(bank) {
let assert [first, second, ..rest] = bank
let assert Ok(third) = list.first(rest)

let #(r, d, _) =
rest
|> list.window_by_2
|> list.fold(
#(
10 * first + second,
10 * int.max(first, second) + third,
int.max(first, second),
),
fn(acc, el) {
let #(r, d, e) = acc
let #(f_n, f_n_1) = el
#(int.max(r, d), 10 * int.max(e, f_n) + f_n_1, int.max(e, f_n))
},
)

int.max(r, d)
})
|> int.sum
}

Yeah, LOL. LMAO, even. Absolutely not. I’m not quantifying over 12 variables. It feels like it should theoretically be possible, but I don’t want to find out. I just did it in the most straightforward way possible.

First, BIG shout-out to Mr. Henry McLoughlin, who taught me Program Construction. He’s a simply lovely person and his enthusiasm for his subject is infectious. I don’t know if I’d have enjoyed the module as much if anyone else taught it. Henry, if you’re reading this, you’re awesome! Thank you so much.

I guessed what Part 2 would be as soon as I read Part 1, and so if I was aiming for speed, I should have just written do_pt_2 for the general case and reused it across both parts. I would probably have had an easier time of it too. However, 1. I wanted to use what I learned in class and 2. I had fun doing it this way. I think barely anyone else would have done it this way.

It has the advantage of being rigorously proved to work, but at the cost of being harder to understand at first glance. I can see how this is useful for high-stakes software where the extra expenditure of mental energy on making sure the code is 100% watertight is worth it, but this is probably not what I’d reach for during everyday programming. It did result in a very, very terse program though, which is super impressive.

The eagle-eyed among you might say that we don’t really need both d and e, in the loop, as d is derived from e. This is true. Program construction doesn’t always produce the most efficient programs. That is between the programmer and the compiler to figure out.

Oh, the title. Yes. That seemed like a lot of thinking, you might object. It probably was if you’re not familiar with Program Construction yet, but once you’ve derived a couple of these theorems, you’ll find that there is no thinking involved. Not in the sense that once you’re good at something, you can do it almost mechanically, but in the sense that there’s only one way this could have gone. Starting from that post-condition, the theorems we proved fall out automatically as we continue expanding our model, and the same can be said for our loop body. Program construction is really easy in that way, because all you’re doing is following the program derivation to its logical

end.