Ensure correctness by changing when the first failure occurs

Failures are expected during development. The key to writing correct code is to detect possible failures before we ship our program. This is the principle underlying the use of static analysis (like compilers), testing, and review; each is a process intended to increase the number and kinds of possible failures we can detect during development. For example, we know that when we see compiler errors, they’ll be found and fixed "at compile time".

So, rather than seeing tests and static analysis as being valuable in and of themselves, what can we glean from seeing them, instead, as useful tools in the task of moving up the first occurrence of each possible failure?

Manual vs automated testing

Once a program has been tested on a particular set of inputs, the failures that will o…

Manual vs automated testing

Once a program has been tested on a particular set of inputs, the failures that will occur in that test have been accounted for. Running the same test repeatedly after issues have been resolved is, therefore, useless, and most code can be written so that it passes tests the first time around, negating the potential benefits of automated testing. Even when code changes, automated tests are an unreliable way to verify the changes, since they were written against different code; the tests that need to be rewritten are precisely those that cover the changes.

Manual tests exercise a whole system in context, so, compared to automated tests, they increase the likelihood of finding any particular failure by virtue of the fact that they bring more of the program into play with each execution. One could argue that automated tests yield greater control of the system under test by setting up and tearing down state, providing test fakes, allowing us to call functions and methods directly, and so forth, and sometimes that control is necessary to exercise code in specific ways. But, while that kind of setup is conventional with automated testing, it’s not exclusive to it. Similar setup, teardown, and programmatic execution can be done from a REPL, for example. Automated tests simply restrict what we do to that which has been automated, frequently at a high cost in time spent.

Startup checks

Most applications do a lot of initialization on startup. This initialization usually involves initializing integrations, communicating with the execution environment, and creating startup data that’s needed for the application to run — things that are not uniform across different environments, and therefore elude development testing and static analysis. Asserting that certain conditions are true during startup can allow us to move the first occurrence of potential failures to the application start time, rather than waiting until specific actions crash the application.

From a certain point of view, making assertions at application startup looks like working with a dependently typed language: From a startup context, we can poke at any application state and write arbitrary code to ensure that arbitrary values look correct, with the caveat that we’re assuming that the values we’re working with (and their types) are fixed at startup (it’s usually easy to tell which ones are).

Configuration and integration

For example, asserting that required environment variables are present during startup can allow us to handle configuration issues at the time of deployment before users experience issues. Or we might check that integrations are wired up, like testing a database connection with a trivial SELECT query during startup.

Side effects

Compilers struggle to meaningfully move up failure occurrences for database queries and other integrations, since these interactions happen at runtime. Often, strongly typed languages will resort to "correct by construction" approaches that attempt to limit the queries that can be executed to those that have been correctly constructed in the application’s programming language. However, these approaches can be limiting, making specific features inaccessible, obfuscating generated queries, while still failing to prevent failures like configuration or migration issues that leak through the abstraction.

But, even though the compiler can’t catch these issues without a wrapper, a database query passing a single manual test indicates that the query was syntactically correct, the bound parameters type checked with the database, and verifies semantics for input values with the same shape. These are pretty strong guarantees, comparable to what we’d get from a strong type system. Indeed, effectful code is not inherently bad; often, effectful code just means that we’re executing someone else’s code, and the key is to leverage checks available in that domain to move up failure occurrences.

Documentation

Teams often lean on one approach or another to ensure code quality. One team might require full coverage for automated tests, another that every piece of code is evaluated by a dedicated QA team, and a third team that strong types are used universally without escape hatches. One thing that’s nice about that kind of all-in approach is that it’s easy to observe which concerns have been accounted for and which others haven’t. While I believe that considerable time can be saved by choosing an appropriate cocktail of approaches that accomplish the task of speeding up failure occurrences, when doing so, it’s important to keep documentation and/or establish conventions to account for which approaches have been taken and ensure that we’re not leaving any gaps.

Manual vs automated testing

Manual vs automated testing

Startup checks

Configuration and integration

Side effects

Documentation

Similar Posts