Here’s a little idiom that I haven’t really seen discussed anywhere, that I think makes Rust code much cleaner and more robust.
I don’t know if there’s an actual name for this idiom; I’m calling it the “block pattern” for lack of a better word. I find myself reaching for it frequently in code, and I think other Rust code could become cleaner if it followed this pattern. If there’s an existing name for this, please let me know!
The pattern comes from blocks in Rust being valid expressions. For example, this code:
let foo = { 1 + 2 };
…is equal to this code:
let foo = 1 + 2;
…which is, in turn, equal to this code:
let foo = {
let x = 1;
let y = 2;
x + y
};
So, why does this matter?
Let’s say you have a function that loads a configuration file, t…
Here’s a little idiom that I haven’t really seen discussed anywhere, that I think makes Rust code much cleaner and more robust.
I don’t know if there’s an actual name for this idiom; I’m calling it the “block pattern” for lack of a better word. I find myself reaching for it frequently in code, and I think other Rust code could become cleaner if it followed this pattern. If there’s an existing name for this, please let me know!
The pattern comes from blocks in Rust being valid expressions. For example, this code:
let foo = { 1 + 2 };
…is equal to this code:
let foo = 1 + 2;
…which is, in turn, equal to this code:
let foo = {
let x = 1;
let y = 2;
x + y
};
So, why does this matter?
Let’s say you have a function that loads a configuration file, then sends a few HTTP requests based on that config file. In order to load that config file, first you need to load the raw bytes of that file from the disk. Then you need to parse whatever the format of the configuration file is. For the sake of having a complex enough program to demonstrate the value of this pattern, let’s say it’s JSON with comments. You would need to remove the comments first using the regex crate, then parse the resulting JSON with something like serde-json.
Such a function would look like this:
use regex::{Regex, RegexBuilder};
use std::{fs, sync::LazyLock};
/// Format of the configuration file.
#[derive(serde::Deserialize)]
struct Config { /* ... */ }
// Always make sure to cache your regexes!
static STRIP_COMMENTS: LazyLazy<Regex> = LazyLock::new(|| {
RegexBuilder::new(r"//.*").multi_line(true).build().expect("regex build failed")
});
/// Function to load the config and send some HTTP requests.
fn foo(cfg_file: &str) -> anyhow::Result<()> {
// Load the raw bytes of the file.
let config_data = fs::read(cfg_file)?;
// Convert to a string to the regex can work on it.
let config_string = String::from_utf8(&config_data)?;
// Strip out all comments.
let stripped_data = STRIP_COMMENTS.replace(&config_string, "");
// Parse as JSON.
let config = serde_json::from_str(&stripped_data)?;
// Do some work based on this data.
send_http_request(&config.url1)?;
send_http_request(&config.url2)?;
send_http_request(&config.url3)?;
Ok(())
}
This is fairly simple, and just leverages a few Rust crates and language features to parse JSON and then do something with it.
However, there are a few weaknesses here. In the foo function, we declare four new variables (config_data, config_string, stripped_data, config) only for only one of those variables to be used after the configuration parsing (config). In addition, let’s say you didn’t know what this code was for going in, and you didn’t have these comments (or you had bad comments). One might ask why you’re declaring the regular expression STRIP_COMMENTS, or why you’re loading data from a file.
When I write code, I try to make it immediately obvious what the purpose of the code is, and why it’s written that way. This is why I generally avoid C’s “bottom-up” strategy for organizing code. It’s like being given a few screws and being expected to implicitly understand that it should be built into a chair. In Rust, I like that you are able to define your top-level functions first, and then go down and define all the bits and pieces after.
Although, we can do a little bit better. What if we organized the foo function like this:
/// Function to load the config and send some HTTP requests.
fn foo(cfg_file: &str) -> anyhow::Result<()> {
// Load the configuration from the file.
let config = {
// Cached regular expression for stripping comments.
static STRIP_COMMENTS: LazyLazy<Regex> = LazyLock::new(|| {
RegexBuilder::new(r"//.*").multi_line(true).build().expect("regex build failed")
});
// Load the raw bytes of the file.
let raw_data = fs::read(cfg_file)?;
// Convert to a string to the regex can work on it.
let data_string = String::from_utf8(&raw_data)?;
// Strip out all comments.
let stripped_data = STRIP_COMMENTS.replace(&config_string, "");
// Parse as JSON.
serde_json::from_str(&stripped_data)?
};
// Do some work based on this data.
send_http_request(&config.url1)?;
send_http_request(&config.url2)?;
send_http_request(&config.url3)?;
Ok(())
}
In this function, we’ve moved all of the configuration-related code (parsing, loading, even the static regex) into the block. This works because Rust lets you have items, statements and expressions inside of a block, hence why we were able to move everything inside. This pattern has three immediate advantages:
- The block starts with the intent of the code (
let config = ...). We can see that we’re working to resolve some kind of configuration object right off the bat. Only then do we move into the implementation details of the code. - It reduces pollution of the namespace of both the
foofunction and the top-level module. Now infoo, the variable namesconfig_data,config_stringet al are no longer used. In addition to allowing these variable names to be re-used, it makes this code a lot more “idiot-proof”. If someone else were to edit thefoofunction, they would only be able to useconfig. They wouldn’t be able to use theraw_dataorSTRIP_COMMENTSitems, which are only meant to be used by theconfigparser. - The variables
raw_dataanddata_stringgo out of scope at the end of the block, which means they are dropped, freeing up resources.
As an aside, all three of these advantages also come if you were to refactor the block out into its own function. However, this pattern has two key advantages over that:
- The code flow is still inline with the rest of the function. For shorter blocks, this improves reading comprehension, since it means you don’t have to go to a different part of the code to fully understand the function.
- If there are a lot of variables that the block would use, it prevents needing to explicitly name those variables as parameters.
There is one more benefit that’s not exposed in the above example: erasure of mutability. Let’s say you construct some object for use in a later part of the function:
let mut data = vec![];
data.push(1);
data.extend_from_slice(&[4, 5, 6, 7]);
data.iter().for_each(|x| println!("{x}"));
return data[2];
The issue is that data is declared as mutable, which means the rest of the function can mutate it. Since a lot of bugs come from data being mutated when it isn’t supposed to be mutated, we’d like to restrict the mutability of the data to a certain area of the function. This is also possible with the block pattern:
let data = {
let mut data = vec![];
data.push(1);
data.extend_from_slice(&[4, 5, 6, 7]);
data
};
data.iter().for_each(|x| println!("{x}"));
return data[2];
This effectively “closes” the mutability to a certain section of the function.
Closing Thoughts
I don’t know if this pattern is already well known to the Rust community. Even if it isn’t, I figure it’s still a good idea to bring it to people who may be inexperienced in Rust.