The Case for Snake Case: A Kolmogorov Complexity Argument

Software engineering is drowning in complexity. Much of it is unintended, implicit, and hidden beneath layers of convention we rarely question. Today, I want to examine one of these conventions: identifier naming. Specifically, I will argue that snake_case is objectively superior to camelCase, and I will use Kolmogorov complexity to make this case.

What is Kolmogorov Complexity?

Kolmogorov complexity measures the computational resources needed to specify an object. In practical terms, it asks: how much information, how many rules, how many external dependencies do we need to perform a given operation?

When we apply this lens to identifier naming conventions, the results are striking.

Parsing Identifiers: Where Complexi…

What is Kolmogorov Complexity?

When we apply this lens to identifier naming conventions, the results are striking.

Parsing Identifiers: Where Complexity Hides

Consider the seemingly simple task of splitting an identifier into its component words. This operation is fundamental, both for tooling (linters, refactoring tools, documentation generators) and for human comprehension.

Snake Case: Minimal Complexity

components = identifier.split("_")

That is it. The entire algorithm fits in a single, trivial operation. The delimiter is explicit, unambiguous, and universal. The underscore character has the same meaning in ASCII, in Unicode, in every locale, in every context. No external knowledge is required. No lookup tables. No edge cases.

Camel Case: Hidden Complexity Explosion

# Good luck.

To split a camelCase identifier, you must:

Find capitalization boundaries. This requires knowing which characters are "uppercase" and which are "lowercase." 1.

Consult the Unicode standard. Capitalization is not a property of characters in isolation. It is defined by the Unicode Standard, a specification that spans thousands of pages and is updated regularly. The uppercase/lowercase mapping for a single character can depend on locale, context, and version of the standard you are using. 1.

Handle abbreviations. Is XMLParser split as [XML, Parser] or [X, M, L, Parser]? What about parseHTTPSURL? The answer depends on implicit human knowledge, conventions that vary by codebase, team, and era. There is no algorithm that can reliably determine this without external context. 1.

Account for edge cases. What about iPhone? Or eBay? These are valid identifiers that violate the "rules" entirely.

The Kolmogorov complexity of camelCase parsing is not merely higher. It is unbounded in a practical sense, because it depends on an external, evolving standard (Unicode) and on implicit cultural knowledge that cannot be formalized.

Constructing Identifiers: The Same Story

Suppose you have a list of words and want to form an identifier.

Snake Case

identifier = "_".join(components)

Done. Append underscores between components. No transformation of the components themselves is required.

Camel Case

identifier = components[0].lower() + "".join(c.title() for c in components[1:])

This looks simple until you ask: what does title() actually do? The answer: it calls into Unicode case mapping. For the character "i", the uppercase form is "I" in most locales, but in Turkish it is "I" (with a dot). The title() function must either choose a locale, consult environment variables, or produce inconsistent results.

You have now introduced a dependency on:

The Unicode standard
Locale settings
Runtime environment configuration

Your identifier construction algorithm is no longer self-contained. Its Kolmogorov complexity has ballooned.

Why This Matters

Some might argue this is academic. Who cares about edge cases with Turkish "i" or abbreviations?

I argue that this matters deeply, for several reasons:

1. Tooling reliability. Every refactoring tool, every linter, every code search engine that works with identifiers must solve this problem. The ambiguity in camelCase means these tools are either incomplete, inconsistent, or carry massive hidden complexity.

2. Internationalization. Software is global. Identifiers increasingly contain Unicode characters. A naming convention that relies on capitalization is fundamentally tied to the Western alphabet’s peculiar property of having case distinctions, a property that most of the world’s writing systems do not share.

3. Cognitive load. When a human reads parseHTTPSURL, they must mentally segment it. Different readers will segment it differently. This ambiguity consumes cognitive resources that could be spent on understanding the actual logic.

4. The principle of least complexity. Unintended complexity is one of the greatest problems in software engineering today. It accumulates silently, manifesting as bugs, maintenance burden, and developer frustration. We should actively seek to minimize it.

An Objective Argument

I am not claiming snake_case is more aesthetically pleasing. Aesthetics are subjective. I am claiming that, by the objective measure of Kolmogorov complexity, snake_case requires fundamentally less information to parse and construct.

Snake case parsing: one operation, one delimiter, no external dependencies.
Camel case parsing: character classification, Unicode case mapping, abbreviation heuristics, cultural conventions.

The difference is not marginal. It is categorical.

What Do Popular Languages Recommend?

Given the complexity argument above, one might wonder: how do major programming languages handle this? I surveyed the official style guides of twelve popular languages.

Language	Variables/Functions	Official Source
C++	`snake_case`	Google C++ Style Guide
Python	`snake_case`	PEP 8
Rust	`snake_case`	RFC 430
Ruby	`snake_case`	Ruby Style Guide
Java	`camelCase`	Oracle Code Conventions
JavaScript	`camelCase`	MDN Guidelines
Go	`camelCase`	Effective Go
C#	`camelCase`	Microsoft Naming Guidelines
Swift	`camelCase`	Swift API Design Guidelines
Kotlin	`camelCase`	Kotlin Coding Conventions
PHP	`camelCase`	PSR-1
Dart	`camelCase`	Effective Dart: Style

The score is 8-4 in favor of camelCase. Does this invalidate my argument?

No. Popularity is not an argument for correctness. Many of these conventions were established decades ago, when ASCII dominance made capitalization seem trivial, when tooling was primitive, and when the hidden costs of implicit complexity were not yet understood.

Consider that Python, one of the most widely adopted languages of the past decade, chose snake_case. Rust, designed with modern sensibilities about safety and correctness, also chose snake_case. Ruby, known for developer happiness, chose snake_case.

The camelCase languages reveal a pattern of convention inheritance rather than deliberate design. Java popularized camelCase in the 1990s. JavaScript adopted "Java" in its name for marketing reasons and copied the convention. C# was Microsoft’s answer to Java. Dart was Google’s attempt to replace JavaScript. Go targeted university graduates trained on Java. Swift inherited from Objective-C, which had used camelCase since the NeXT era. PHP started as a personal project ("Personal Home Page") and grew organically. In most cases, the choice was made to fit existing convention, not because someone analyzed the complexity tradeoffs.

Conclusion

The next time someone dismisses naming conventions as "just style," consider the hidden complexity beneath the surface. Snake case is not merely a preference. It is the convention with lower Kolmogorov complexity, fewer external dependencies, and less room for ambiguity.

In an industry that struggles daily with accidental complexity, choosing the simpler encoding for something as fundamental as identifiers is not pedantry. It is engineering discipline.

use_snake_case. Your future self, your tools, and your international colleagues will have one less thing to worry about.

I write occasionally. Subscribe if you want to know when.

What is Kolmogorov Complexity?

Parsing Identifiers: Where Complexi…

What is Kolmogorov Complexity?

Parsing Identifiers: Where Complexity Hides

Snake Case: Minimal Complexity

Camel Case: Hidden Complexity Explosion

Constructing Identifiers: The Same Story

Snake Case

Camel Case

Why This Matters

An Objective Argument

What Do Popular Languages Recommend?

Conclusion

Similar Posts