This article has been written by me with AI Assistance.
Why is Rust Sometimes Slower Than Expected?
Rust is widely known for its performance and safety, often being compared to C and C++. However, there are scenarios where Rust code may not perform as fast as anticipated. This article delves into the micro benchmark performance of various language implementations of the same abstraction and the surprising results they produce.
⚠️ Performance Results Not Reliable
This is a micro benchmark relevant to the abstractions i am building and is not suitable to be used for general decision making on performance.
Since I have not reviewed any of the implementation, AI may have made mistakes which invalidate these performance testing results and their observations.
I believe many …
This article has been written by me with AI Assistance.
Why is Rust Sometimes Slower Than Expected?
Rust is widely known for its performance and safety, often being compared to C and C++. However, there are scenarios where Rust code may not perform as fast as anticipated. This article delves into the micro benchmark performance of various language implementations of the same abstraction and the surprising results they produce.
⚠️ Performance Results Not Reliable
This is a micro benchmark relevant to the abstractions i am building and is not suitable to be used for general decision making on performance.
Since I have not reviewed any of the implementation, AI may have made mistakes which invalidate these performance testing results and their observations.
I believe many more optimizations prompt request can improve the current implementations by AI, as well as re-implementing the assembler logic manually, which will invalidate the testing results and their current observations.
Disclaimer
Note: I am only familiar with C#. I am not familiar with idioms or conventions in other languages.
The implementation used to generate the performance data are not handwritten implementations; they are all fully generated using different models (GPT 4.1 and Claude Sonnet 4 and 4.5) with zero review of the implementation.
I have ensured the implementation output are correct by using AI-generated tests to verify the results across all engines and languages and also visually confirming the expected output which are published in fly.io.
No optimizations have been done using AI till date
Datasource
See the performance comparison report for detailed metrics.
The performance reports can also be generated from the deployed language assemblers:
| Language | Assembler Demo in fly.io |
|---|---|
| Javascript | javascript Assembler Web Client Side Assembly |
| C# | csharp Assembler Web Server Side Assembly |
| Rust | rust Assembler Web Server Side Assembly |
| Go | go Assembler Web Server Side Assembly |
| Node.js | node Assembler Web Server Side Assembly |
| PHP | PHP Assembler Web Server Side Assembly |
The output generated in my PC which is used for analysis in this article is also available in All Performance Results which will be periodically regenerated as the implementations are improved as well as when newer rules are added.
Interesting Observations (Generated by Claude Sonnet 4.5)
- The PreProcess Paradox: While preprocessing universally improves performance, C# and Node.js show regressions in some HtmlRule1 tests (C# avg: 3.08ms → 3.68ms, Node avg: 4.96ms → 4.42ms for the group). This suggests that for simpler templates, the preprocessing overhead exceeds the merging benefits, challenging the “always preprocess” assumption.
- The Sub-Millisecond Club: Node.js achieves an exclusive performance tier with five PreProcess tests under 1ms (0.57ms, 0.80ms, 0.90ms, 0.92ms, 0.99ms) for JSON rules - a threshold no other server-side language consistently reaches. This establishes a 2-3x advantage over even the second-place contenders in these scenarios.
- Variance as a Performance Indicator: Go demonstrates the tightest min/max ranges across all tests (often within 2-3ms spread), while C# shows extreme outliers (HtmlRule3A max: 23.47ms vs avg: 6.56ms). This variance suggests that Go’s performance is more predictable under varying system loads, whereas C# may experience JIT compilation or garbage collection pauses.
- Client-Side JavaScript’s Hidden Strength: Despite running in a browser environment with additional overhead, client-side JavaScript wins 8 out of 24 Normal Engine tests, often beating server-side Node.js. This suggests the V8 JIT optimizations are highly effective, and the browser’s rendering isolation doesn’t significantly impact pure computation tasks.
- PHP’s Consistent 2-5x Penalty Floor: PHP maintains a remarkably consistent performance penalty across all test categories and all engines, never getting within 2x of the winner. Unlike other languages that show workload-specific advantages, PHP’s universal slowdown points to fundamental interpreter overhead that preprocessing cannot overcome.
- The Rust Preprocessing Coefficient: Rust shows the most dramatic preprocessing gains (3x+ in some cases: HtmlRule1 avg 9.00ms → 3.01ms), but also the highest variance in PreProcess mode for complex rules (JsonRule2B max jumps to 15.13ms). This suggests Rust’s AI-generated parsing code has optimization opportunities that, once addressed, could make it consistently competitive with Go.
Interesting Observations (Written by me)
- Recursion Has a Cost: Based on Rule 1 with two scenarios, HtmlRule1A and HtmlRule1B, and the performance results, I can see that recursion has a higher cost since the only difference between the two scenarios is the nesting of the direct assembly of components. Maybe use iteration instead of recursion to improve performance?
- Testing in Debug: Testing in Debug in Rust is worse when compared with all other languages.
- Backup Folder Size: I thought node_modules folder size was huge leading to slow backup if they are not ignored but now i can say rust target folders are more huge and backup without doing cargo clean will increase backup time.
Interesting Observations (Generated by GPT4.1)
- Consistency Across Languages in PreProcess Engine: The performance gap between languages narrows significantly when using the PreProcess Engine. This suggests that the initial parsing and data structure setup is the main bottleneck, and once that is optimized, most modern languages can achieve similar throughput for the merging step.
- Go’s Predictable Performance: Go consistently delivers low and stable times across both Normal and PreProcess Engine scenarios, especially for HTML rules. This highlights Go’s efficiency in handling both parsing and merging workloads, making it a strong candidate for server-side template processing.
- Node.js and JavaScript Lead in JSON Tasks: Node.js and client-side JavaScript are often the fastest for JSON-based rules, likely due to the highly optimized V8 engine and native JSON parsing. This makes them particularly suitable for workloads that are JSON-heavy.
- Rust’s Performance Highly Dependent on Optimization: Rust’s performance is competitive in the PreProcess Engine but lags in the Normal Engine, especially without manual optimization. This underscores the importance of reviewing and tuning Rust code, as the default or AI-generated implementations may not leverage Rust’s full potential.
- PHP Lags Behind: PHP is consistently the slowest across most scenarios, both in Normal and PreProcess Engine. This is expected due to its interpreted nature and less optimized data handling for these workloads.
- Impact of Output Size: For rules with larger output sizes, the performance differences between languages can become more pronounced, especially in the Normal Engine. This suggests that memory management and string handling efficiency play a bigger role as output grows.
Interesting Observations (Generated by Gemini AI)
- Pre-processing is a Game Changer: The most significant observation is the dramatic performance increase across all languages when using the “PreProcess Engine”. For example, in
JsonRule2A, C#’s time drops from ~266ms to ~4ms, and Node.js drops from ~58ms to ~2ms. This indicates that for this workload, the initial parsing and setup of the templates is the most expensive part of the operation. - Rust is Not a Silver Bullet: In the “Normal Engine” tests, which involve file I/O and parsing, Rust is frequently outperformed. Go is consistently the winner in HTML-based rules, while Node.js and client-side JavaScript dominate the JSON-based rules. This challenges the assumption that Rust is automatically the fastest for all workloads. The performance of its initial, AI-generated code was surprisingly poor, requiring specific optimization just to beat PHP.
- Go and Node.js Shine in “Cold Starts”: In the “Normal Engine” scenario, Go shows exceptional performance on HTML processing tasks. Node.js (and by extension, the V8 JavaScript engine) is the clear winner for JSON-related tasks, likely due to its highly optimized, built-in JSON parser.
- Pre-processing as the Great Equalizer: While performance differences are stark in the “Normal Engine”, they shrink considerably in the “PreProcess Engine”. Once the initial parsing is done, most of the languages perform within a few milliseconds of each other for many tests. However, Node.js and client-side Javascript often maintain a slight edge, clocking in at under a millisecond for several pre-processed JSON rules.
- The Cost of Abstraction: The results suggest that the specific abstractions used in this Assembler project have a significant performance overhead that is language-agnostic. The massive speedup from pre-processing implies that the way templates are parsed and prepared is far more critical than the language executing the final assembly. For this specific, AI-generated, non-hand-optimized workload, the choice of algorithm and I/O handling has a much larger impact on performance than the choice of language itself.
Consolidated Performance Results
The tables below show the time in milliseconds for 1000 iterations in each language to perform a series of template assembly tasks.
Normal Engine: Normal Engine does the parsing and merging for each rule and during loading teh raw template is loaded.
PreProcess Engine: PreProcess Engine does only the merging for each rule and during loading it is parsed into a structure which is used by the engine to merge.
The results are split into two main scenarios:
Normal Engine: This measures the performance of loading the template, parsing it, and applying the rules of assembly from scratch for every operation.
PreProcess Engine: This measures performance when the template is loaded and parsed once, and the resulting structure is cached. Subsequent operations only need to apply the rules of assembly, which should be significantly faster.
The OutputSize column indicates the size of the generated content in bytes. The fastest time for each test is highlighted in bold.
Generated: 2025-10-21 08:41:01 UTC | Iterations: 1000, Warmup: 100 | All times in milliseconds (ms)
Grouped View (Min/Avg/Max by Rule Groups)
HtmlRule1
| Language | Normal Engine (Min/Avg/Max) | PreProcess Engine (Min/Avg/Max) |
|---|---|---|
| CSharp | 2.24 / 3.08 / 3.92 | 2.39 / 3.68 / 4.97 |
| Rust | 5.92 / 9.00 / 12.07 | 1.92 / 3.01 / 4.10 |
| Go | 1.64 / 2.68 / 3.72 | 1.65 / 2.44 / 3.23 |
| Node | 4.33 / 4.96 / 5.59 | 2.72 / 4.42 / 6.11 |
| PHP | 12.91 / 17.85 / 22.79 | 7.70 / 10.47 / 13.25 |
| Javascript | 3.30 / 4.40 / 5.50 | 2.70 / 2.95 / 3.20 |
HtmlRule2
| Language | Normal Engine (Min/Avg/Max) | PreProcess Engine (Min/Avg/Max) |
|---|---|---|
| CSharp | 3.11 / 6.37 / 8.54 | 2.45 / 4.05 / 6.10 |
| Rust | 9.94 / 15.28 / 22.66 | 5.61 / 10.56 / 17.07 |
| Go | 2.55 / 4.81 / 8.60 | 1.62 / 2.61 / 3.85 |
| Node | 3.83 / 5.75 / 8.07 | 1.37 / 2.12 / 2.77 |
| PHP | 19.82 / 34.46 / 53.27 | 7.26 / 10.27 / 13.81 |
| Javascript | 3.60 / 6.23 / 9.20 | 1.30 / 2.43 / 3.70 |
HtmlRule3
| Language | Normal Engine (Min/Avg/Max) | PreProcess Engine (Min/Avg/Max) |
|---|---|---|
| CSharp | 2.49 / 6.56 / 23.47 | 3.07 / 5.11 / 7.82 |
| Rust | 5.57 / 7.52 / 9.22 | 2.17 / 2.96 / 3.76 |
| Go | 2.01 / 2.40 / 2.70 | 2.12 / 2.39 / 3.06 |
| Node | 1.28 / 2.03 / 3.52 | 0.80 / 2.39 / 3.05 |
| PHP | 13.79 / 17.05 / 21.34 | 7.60 / 11.75 / 17.67 |
| Javascript | 1.40 / 1.95 / 2.50 | 0.90 / 1.77 / 2.80 |
JsonRule1
| Language | Normal Engine (Min/Avg/Max) | PreProcess Engine (Min/Avg/Max) |
|---|---|---|
| CSharp | 13.81 / 34.47 / 64.41 | 2.98 / 4.42 / 6.97 |
| Rust | 10.53 / 16.01 / 23.08 | 1.54 / 2.06 / 3.40 |
| Go | 9.54 / 20.46 / 32.20 | 1.61 / 2.53 / 4.00 |
| Node | 3.99 / 9.28 / 17.48 | 0.57 / 1.51 / 3.56 |
| PHP | 20.36 / 50.75 / 113.19 | 7.24 / 9.64 / 15.45 |
| Javascript | 3.30 / 6.97 / 13.20 | 1.20 / 1.85 / 2.90 |
JsonRule2
| Language | Normal Engine (Min/Avg/Max) | PreProcess Engine (Min/Avg/Max) |
|---|---|---|
| CSharp | 114.97 / 233.16 / 398.90 | 2.80 / 4.50 / 8.23 |
| Rust | 38.21 / 80.90 / 140.74 | 1.55 / 6.89 / 15.13 |
| Go | 61.34 / 118.28 / 199.56 | 2.00 / 3.75 / 7.00 |
| Node | 25.94 / 54.27 / 91.61 | 0.92 / 1.74 / 3.23 |
| PHP | 91.04 / 127.02 / 160.15 | 7.96 / 10.07 / 14.52 |
| Javascript | 26.10 / 57.52 / 107.60 | 1.10 / 2.25 / 5.20 |
Rule1
| Language | Normal Engine (Min/Avg/Max) | PreProcess Engine (Min/Avg/Max) |
|---|---|---|
| CSharp | 147.89 / 182.35 / 216.81 | 4.70 / 5.48 / 6.27 |
| Rust | 39.80 / 47.47 / 55.13 | 4.34 / 5.89 / 7.44 |
| Go | 69.73 / 88.52 / 107.30 | 3.50 / 5.00 / 6.50 |
| Node | 29.60 / 36.02 / 42.45 | 1.90 / 3.57 / 5.24 |
| PHP | 170.26 / 206.63 / 243.01 | 14.09 / 18.02 / 21.96 |
| Javascript | 24.60 / 31.70 / 38.80 | 2.40 / 4.15 / 5.90 |
Normal Engine
| AppSite/AppView | CSharp | Rust | Go | Node | PHP | Javascript | OutputSize |
|---|---|---|---|---|---|---|---|
| HtmlRule1A | 2.24 | 5.92 | 1.64 | 4.33 | 12.91 | 5.50 | 1264 |
| HtmlRule1B | 3.92 | 12.07 | 3.72 | 5.59 | 22.79 | 3.30 | 2123 |
| HtmlRule2A | 3.11 | 9.94 | 2.55 | 4.50 | 19.82 | 3.60 | 1910 |
| HtmlRule2B | 6.80 | 12.87 | 3.83 | 6.92 | 29.04 | 5.90 | 2365 |
| HtmlRule2C | 6.96 | 11.60 | 3.69 | 5.19 | 32.26 | 5.20 | 1920 |
| HtmlRule2D | 5.77 | 14.33 | 3.62 | 3.83 | 28.86 | 4.70 | 2083 |
| HtmlRule2E | 7.04 | 20.26 | 6.58 | 5.99 | 43.49 | 9.20 | 2840 |
| HtmlRule2F | 8.54 | 22.66 | 8.60 | 8.07 | 53.27 | 8.80 | 2874 |
| HtmlRule3A | 2.49 | 6.72 | 2.70 | 1.28 | 13.79 | 1.40 | 1428 |
| HtmlRule3A → Html3A | 23.47 | 6.68 | 2.69 | 1.80 | 14.53 | 1.60 | 1428 |
| HtmlRule3A → Html3B | 2.64 | 5.57 | 2.14 | 1.51 | 14.04 | 1.40 | 1428 |
| HtmlRule3B | 2.86 | 8.25 | 2.15 | 2.04 | 18.93 | 2.40 | 1406 |
| HtmlRule3B → Html3A | 3.79 | 8.70 | 2.70 | 2.05 | 19.64 | 2.40 | 1406 |
| HtmlRule3B → Html3B | 4.10 | 9.22 | 2.01 | 3.52 | 21.34 | 2.50 | 1406 |
| JsonRule1A | 29.70 | 10.53 | 23.57 | 5.92 | 20.36 | 4.20 | 1417 |
| JsonRule1B | 13.81 | 11.08 | 9.54 | 3.99 | 21.61 | 3.30 | 1924 |
| JsonRule1C | 29.97 | 23.08 | 16.53 | 9.73 | 47.85 | 7.20 | 3798 |
| JsonRule1D | 64.41 | 19.35 | 32.20 | 17.48 | 113.19 | 13.20 | 320 |
| JsonRule2A | 265.76 | 84.13 | 128.60 | 57.51 | 153.75 | 54.10 | 2355 |
| JsonRule2B | 398.90 | 140.74 | 199.56 | 91.61 | 160.15 | 107.60 | 2906 |
| JsonRule2C | 114.97 | 38.21 | 61.34 | 25.94 | 91.04 | 26.10 | 2799 |
| JsonRule2D | 152.99 | 60.51 | 83.60 | 42.01 | 103.13 | 42.30 | 3217 |
| Rule1A | 147.89 | 39.80 | 69.73 | 29.60 | 170.26 | 24.60 | 2543 |
| Rule1B | 216.81 | 55.13 | 107.30 | 42.45 | 243.01 | 38.80 | 4772 |
PreProcess Engine
| AppSite/AppView | CSharp | Rust | Go | Node | PHP | Javascript | OutputSize |
|---|---|---|---|---|---|---|---|
| HtmlRule1A | 2.39 | 1.92 | 1.65 | 2.72 | 7.70 | 3.20 | 1264 |
| HtmlRule1B | 4.97 | 4.10 | 3.23 | 6.11 | 13.25 | 2.70 | 2123 |
| HtmlRule2A | 2.45 | 5.61 | 2.16 | 2.14 | 7.26 | 2.40 | 1910 |
| HtmlRule2B | 3.59 | 10.05 | 2.11 | 1.79 | 8.78 | 1.60 | 2365 |
| HtmlRule2C | 4.21 | 8.88 | 1.62 | 1.37 | 7.82 | 1.30 | 1920 |
| HtmlRule2D | 4.25 | 8.95 | 2.66 | 1.87 | 10.47 | 3.70 | 2083 |
| HtmlRule2E | 3.67 | 17.07 | 3.29 | 2.77 | 13.48 | 3.00 | 2840 |
| HtmlRule2F | 6.10 | 12.83 | 3.85 | 2.77 | 13.81 | 2.60 | 2874 |
| HtmlRule3A | 3.07 | 3.29 | 2.16 | 0.80 | 7.60 | 0.90 | 1428 |
| HtmlRule3A → Html3A | 3.39 | 2.39 | 2.19 | 2.60 | 11.04 | 1.60 | 1428 |
| HtmlRule3A → Html3B | 3.39 | 2.44 | 2.12 | 2.46 | 9.76 | 1.00 | 1428 |
| HtmlRule3B | 5.43 | 2.17 | 2.19 | 3.05 | 10.38 | 2.10 | 1406 |
| HtmlRule3B → Html3A | 7.82 | 3.70 | 2.61 | 2.52 | 14.06 | 2.80 | 1406 |
| HtmlRule3B → Html3B | 7.56 | 3.76 | 3.06 | 2.90 | 17.67 | 2.20 | 1406 |
| JsonRule1A | 2.98 | 1.77 | 2.00 | 0.57 | 7.35 | 1.50 | 1417 |
| JsonRule1B | 3.53 | 1.54 | 2.52 | 1.01 | 7.24 | 1.20 | 1924 |
| JsonRule1C | 6.97 | 3.40 | 4.00 | 3.56 | 15.45 | 2.90 | 3798 |
| JsonRule1D | 4.19 | 1.54 | 1.61 | 0.90 | 8.53 | 1.80 | 320 |
| JsonRule2A | 4.12 | 9.25 | 4.00 | 1.82 | 9.44 | 1.60 | 2355 |
| JsonRule2B | 8.23 | 15.13 | 7.00 | 3.23 | 14.52 | 5.20 | 2906 |
| JsonRule2C | 2.86 | 1.55 | 2.00 | 0.92 | 7.96 | 1.10 | 2799 |
| JsonRule2D | 2.80 | 1.62 | 2.00 | 0.99 | 8.35 | 1.10 | 3217 |
| Rule1A | 4.70 | 4.34 | 3.50 | 1.90 | 14.09 | 2.40 | 2543 |
| Rule1B | 6.27 | 7.44 | 6.50 | 5.24 | 21.96 | 5.90 | 4772 |