Once in your lifetime as a backend developer, you might have run `cargo build` on a large Rust codebase and saw your system frozen or face and out-of-memory error.
This happens due to compiled languages like Rust can ask for substantial system resources during the build process. Once you understand what goes behind compiling a rust project and why it can fail, you should be able to optimize builds for resource utilizations to get a successful builds, without actually buying a Mac!!
Compilation is a CPU intensive process. Rustc, the rust compiler performs complex operations like Lexical analysis, parsing, type checking, borrow checking, monomorphization, LLVM optimization, code generation
Cargo parallelizes this building process when possible. On modern multi-core system, car…
Once in your lifetime as a backend developer, you might have run `cargo build` on a large Rust codebase and saw your system frozen or face and out-of-memory error.
This happens due to compiled languages like Rust can ask for substantial system resources during the build process. Once you understand what goes behind compiling a rust project and why it can fail, you should be able to optimize builds for resource utilizations to get a successful builds, without actually buying a Mac!!
Compilation is a CPU intensive process. Rustc, the rust compiler performs complex operations like Lexical analysis, parsing, type checking, borrow checking, monomorphization, LLVM optimization, code generation
Cargo parallelizes this building process when possible. On modern multi-core system, cargo will attempt to utilize all available CPU cores due to which system-wide slowdowns and the machine can heat up.
Now while CPU usage is expected, the memory consumption is the real bottleneck. Several factors contribute to high memory usage during the build process. One is monomorphization.
Rust uses generics. Since rust requires the type of a variable to be defined, generics allow us to do operation on multiple types in the same variable. Rust generics work through monomorphization, the compiler generates a separate copy of generic code for each concrete type it is going to be used with.
fn process_something<T>(data: T) {...}
process_something(10);
process_something(10.0);
For the example, upon compilation the rust compiler would add the following code
fn process_something_u64(data: u64) {...}
fn process_something_f64(data: f64) {...}
The compiler generates separate new versions of `process_something()` each optimized for its specific concrete type that is going to be used for the generic function.
Large codebases heavily rely on generic libraries for async operations, data transformations, serializations etc and this can create a lot of function variants and each variant will require additional memory to compile.
Increase in Dependency graph also contributes to this problem. If you are working on a large rust codebase, chances are it is not a standalone project. It might depend on web frameworks like axum, serialization crates, clients, cryptography libs, etc.
Each dependency also brings its own dependency and a project with 10 dependency might be actually compiling 100 crates and each being compiled consumes more memory for syntax trees, type information, debug information etc.
Then next is Procedural Macros. They are particularly memory hungry because they run as separate processes at compile time, generate additional code, gets compiled before being used first, can generate new code from small annotations.
After this comes the LLVM IR to eat more memory. It stands for Low-Level Virtual Machine Intermediate Representation. It is a portable, low-level, assembly like language that is between the high level Rust code and the final machine code. The LLVM optimizer runs many transformations and optimizations like constant folding, inlining, dead code elimination, vectorization, etc. This can be memory intensive in large rust projects, in fact it can be the most memory and CPU hungry part of the Rust build.
The memory is consumed for the above optimizations plus code generation for the target architecture and Link-time optimization. For LTO the memory requirements can increase a lot as the LLVM must keep a lot of program in the memory simultaneously to get a whole program view.
The most common reason behind builds failing is insufficient RAM. A developer with 16 GB RAM can struggle against building large projects with other applications running simultaneously.
During the building process, the RAM can get exhausted and then the system resorts to a swap space. It is a disk based virtual memory that the operating system uses when the RAM is full, but swap is slow compared to RAM. It is a backup area that extends the RAM, although it is slower it allows the system to keep running instead of crashing when the memory is exhausted.
This process is managed by the kernel and happens automatically.
Once the compilation starts swapping, build times can increase by a lot, from seconds to minutes and possible hours for large projects. Due to this the system might become unresponsive and can be mistaken for a stalled build and terminate before completing the compilation.
Tracking memory and swap consumption in a large Rust project build
Codegen is short for code generation, this is where the Rust compiler converts the program into LLVM IR and runs optimizations to finally emit the machine code. A codegen unit is a chunk of your crate that Rust compiles separately in to machine code.
You can think of a codegen uint as an independent work package that the LLVM can process in parallel. By splitting your original code into multiple codegen units, rust allows LLVM to run in parallel across all CPU cores. This is done for improving the compile speed and do optimizations faster.
Suppose your project has 100 functions, you can set the number of codegen units in `Cargo.toml` profile. Suppose you set it to 1, the compiler gives all 100 functions to LLVM as 1 big chunk, so it does the optimizations globally across the entire project or crate and the compile time is longer here, but the build size would be the smallest.
If you set more codegen units, lets say 16 then 16 smaller groups each of about 6 functions each are compiled in parallel. This makes it much faster. It is a speed vs performance trade-off.
So it is a double edged sword, while it can dramatically reduce the build time on powerful machines it can still overwhelm your hardware as each codegen unit consumes memory independently. So on systems with limited RAM reducing the parallelization might become necessary.
Above we listed multiple reasons due to which compiling a large Rust project might fail and what happens under the hood when you use cargo. But we can do some optimizations while building the project to manage the resource consumption.
Limit parallel Jobs: You can control how many crates Cargo compiles simultaneously. Using the command `cargo build -j 4` limits the parallel jobs to 4. You can further reduce this to consume less RAM. It might increase the build time of the large project, but can make a successful build.
Feature compilation: Many crates offer feature flags to reduce dependencies. Like `serde = { version = “1.0”, features = [”derive”] }`. It disables unnecessary features and can significantly reduce the compilation requirements
Workspace optimization: For multi-crate projects use cargo workspaces to share dependencies. This ensures that dependencies are compiled once and shared across all the workspace members.
Pre-built dependencies: You should be able to install some system packages or their pre-compiled binaries that are heavy into your OS and provide them as a system support rather than a crate. Such as libssl-dev.
Remote build: Use cloud build services like github actions, remote development servers, VMs for heavy builds and running tests if it is impossible for your machine.
Reduce codegen units: This might increase your build time, but can make a successful build of a large project with less memory consumption.
Finally resource consumption during compiling Rust projects is not a limitation, it is just how Rust works with its type system and aggressive optimizations. It is by design. These make the rust code run fast and safe in return making the compilation process more resource intensive.
We should understand the workings of the tech we are using and how to best optimize it so we actually get results we are hoping for. You can always throw money at the problem and buy a better hardware, but then you leave other ways this problem could have been approached that makes a really mature engineer.
No posts