When 2 + 2 Does Not Equal 4.0
Introduction: What Do You Mean by Equal?
What does it mean for two values to be equal? In designing programming languages or defining types, we have to consider this question. And the wrong answer can create problems. It can lead to counter-intuitive surprises like in JavaScript where "" and [0] both are equal to 0 but not to each other. It can confuse programmers who expect two equal values to be the same, when they are actually quite different.
In this essay, I will compare functional equality with semantic equality and survey equality testing semantics across different programming languages.
Leibniz’s Law
Philosophers have long debated the question of what makes two things identical, and have come up with a criterion…
When 2 + 2 Does Not Equal 4.0
Introduction: What Do You Mean by Equal?
What does it mean for two values to be equal? In designing programming languages or defining types, we have to consider this question. And the wrong answer can create problems. It can lead to counter-intuitive surprises like in JavaScript where "" and [0] both are equal to 0 but not to each other. It can confuse programmers who expect two equal values to be the same, when they are actually quite different.
In this essay, I will compare functional equality with semantic equality and survey equality testing semantics across different programming languages.
Leibniz’s Law
Philosophers have long debated the question of what makes two things identical, and have come up with a criterion called Leibniz’s Law, or “the identity of indiscernibles”. This law says that two things are identical if they have all their properties in common. In other words, two things are the same if there is no way to tell them apart.
The formal expression of Leibniz’s Law is:
$$ x = y ⟺ ∀f (f(x) = f(y)) $$
Functional Equality and Substitutability
This translates nicely to a definition of equality in a pure functional programming language: two values are indiscernible if they have all function results in common (i.e., no function produces different results when applied to them). Two such indiscernible values are functionally equal.
We can extend this definition of functional equality to languages that aren’t purely functional by defining “function” to include all operators or procedures that can be applied to a value. Then, we’ll say that f(x) = f(y) also means that any effects of f(x) are equal to those of f(y). By this definition, two values are functionally equal if we can substitute one for the other in our program without changing program behavior.
Semantic vs. Functional Equality
The criteria for functional equality is very strict. It generally means that two values can be functionally equal only if they have the exact same type – otherwise for example typeOf(x) != typeOf(y). In fact, if any function can inspect the binary representation of a value, then two values can be functionally equal only if they are identical down to the bit.
So in many programming languages, the == operator does not test for strict functional equality! For example, the integer 4 and the float 4.0 are often ==, but not functionally equal, because for example integer division works differently than float division, or they have different string forms.
For example, in Python:
>>> 2+2 == 4.0
True
>>> str(2+2) == str(4.0)
False
So functional equality is different from numeric equality.
When comparing two numbers, it is usually numeric equality we are interested in. But not always. In certain scenarios such as unit tests, we truly care about substitutability. If a float32 and float64 behave differently in any way, then you can’t always substitute one for the other without changing the program output.
Numeric equality is a type of semantic equality. In this essay, I’ll define and explain the idea of semantic equality. But first, I’ll address some possible points of confusion in the definition of functional equality. Armed with an understanding of these two concepts, I’ll compare example scenarios where either functional or semantic equality tests are needed. Finally I’ll survey how the different types of equality tests are made in a variety of programming languages.
Functional Equality
Functional Equality vs. Identity
Functionally equal values don’t have to be identical in the sense of Leibniz’s Law, because they may have different internal implementations as long as those representations are not exposed in the type’s public API.
For example, a set type might be internally represented using a binary tree. Two binary trees holding the exact same values could have different structure (e.g. if the values were inserted in different order). But if all public functions/methods (such as toString) exposed set elements in sorted order, and no public functions exposed the tree structure, then two sets internally represented by different trees could be functionally equal.
So two values are functionally equal if they are substitutable. They don’t have to be absolutely identical.
Functional Equality and Functional Purity
Substitutability is an important concept in the theory of functional programming. A functionally pure language must respect the principle of referential transparency, which requires that you can substitute an expression with its value without changing program behavior.
This almost looks identical to the definition of functional equality. But there is a subtle difference. Let’s compare the criteria:
| | Referential Transparency | Functional Equality |
| | ———————— | —————–– |
| criteria | you can substitute an expression with its value without changing program behavior | you can substitute an expression with an equal value without changing program behavior |
| example | if f(x) is 2, and the value of x is 1.0, then you can replace f(x) with f(1.0) and nothing will change | if f(x) is 2, and x is equal to 1.0, then you can replace f(x) with f(1.0) and nothing will change |
The difference is that referential transparency does not require comparing two values for equality. But in practice, a functional equality operator is critical for some functional programming techniques, such as caching/memoization, as I’ll demonstrate in the cache example later in this essay.
Equal Values vs. Equal Variables
One may be tempted to say that two values cannot be functionally equal if they have different memory addresses. But this confuses values with variables. Variables have addresses, but values have no address – they just are. When a programmer writes a == b, they mean to test whether the value referred to by a is equal to the value referred to by b, not whether a and b are the same variable.
This means that when comparing two pointers or references to mutable objects, functional equality requires that they point to the exact same address. Even if the values stored at two different addresses are equal, mutating one will not have the same effect on program behavior as mutating the other, so they are not substitutable. Even pointers to immutable copies of identical values are not functionally equal if the address of the pointer itself can be inspected (e.g. via toString).
Semantic Equality
When comparing any two values, we usually don’t care if they are 100% functionally equal. We just care if they are, well…the same, for all intents and purposes.
But what are those intents and purposes? I argue that our intent is always to test whether two different values are representations of the same underlying semantic value.
For numbers, the semantic value is its numeric value (with minor exceptions for IEEE floats, like NaNs or signed zeros). But there are many other types of semantic values that could have more than one representation.
- The same instant in time in different time zones
- Unicode strings with the same NFC form (e.g. “ñ” (U+00F1) and “ñ” (U+006E U+0303))
- Sets represented by different lists but with the same elements (e.g.
{1,2}and{2,1}) - The same length represented in different units (e.g.
1inand2.54cm) - Fractions with the same normal form (e.g.
2/4and1/2)
In all these cases, there are two distinct entities: the semantic value, and its representation. These have a 1-to-many relationship.

Programmers can’t work with a value without representing it somehow. We need to choose a format/precision for numbers. We need to store the elements of a set in some order. So the actual values of variables programmers work with are always representations of some semantic value.
We can formally define semantic equality as:
$$ x = y ⟺ SemanticValue(x) = SemanticValue(y) $$
The Relevant “Semantics”
The meaning of SemanticValue depends on your particular problem domain. For example, an application that processes timestamps in event logs probably should use timestamps semantics: two timestamps are equal if they represent the same instant in time, regardless of the location where the event occurred.
On the other hand, a calendar application may use local time semantics, where the actual time zone of an event is often relevant (a calendar event that starts at 9am New York Time behaves differently from a calendar event that starts at 3pm Madrid Time).
So there are therefore multiple possible ways that two times could be semantically equal. In such cases, it’s important for the programmer to understand which semantics are relevant in order to perform the right kind of equality test.
Canonical Representations
When there are multiple ways to represent the same semantic value, we can always designate one as the canonical representation. Examples of canonical representations:
- Instants in Time: Unix timestamps (seconds since the epoch, UTC)
- Unicode Strings: the NFC form
- Sets: an ordered list
- Length: length in meters
- Rational numbers: normal form (divide by GCD)
Whatever the semantics, programmers can perform semantic equality tests by performing a functional equality tests on the canonical representation.
Examples (Javascript)
// compare two timestamps using instant-in-time semantics
const a = Temporal.ZonedDateTime.from('2023-01-01T00:00:00[UTC]');
const b = Temporal.ZonedDateTime.from('2022-12-31T19:00:00[America/New_York]');
a.equals(b); // False
a.toInstant().equals(b.toInstant()); // True
// compare two Unicode strings using NFC semantics
const a = 'ñ'; // U+00F1
const b = 'n\u0303'; // U+006E U+0303
a === b; // False
a.normalize('NFC') === b.normalize('NFC'); // True
Examples (Python)
# compare two lists using set semantics
a = [1, 2]
b = [2, 1]
a == b # False
set(a) == set(b) # True
import astropy.units as u
from astropy.units import imperial
u.add_enabled_units(imperial)
# compare two measurements using physical length semantics
a = 1 * imperial.inch
b = 2.54 * u.cm
a == b # False
a.to(u.m) == b.to(u.m) # True
Canonical Types
This naturally suggests the question: why do we even have non-canonical representations? If you always represent timestamps in event logs using epoch seconds, you never need to convert before comparing. If your rationals are always in normal form, or your set elements are always sorted, or your distances are always in meters, there is no possible confusion.
Many languages facilitate creation of canonical types: types that only permit canonical representations.
For example, a canonical Fraction type only allows fractions represented in normal form. Python’s fractions.Fraction is an example: the expression Fraction(2, 4) is automatically converted to normal form Fraction(1, 2). Ruby’s Fractional, Haskell’s Data.Ratio, Julia’s Rational all do the same.
With canonical types, there is a 1:1 relationship between the values of that type, and the semantic values they are supposed to represent. As a result, for these types functional equality is equivalent to semantic equality.
When Functional Equality Matters
Example: Caching
When implementing a cache, two cache keys should probably not be considered equal unless they are functionally equal. Otherwise the cache can return a value for a functionally different key. This would violate the basic principle of a cache, which should only affect program performance, never behavior.
For example, a polymorphic function f(x) might accept either a float32 or a float64 argument. It could be that the float32 version of f and the float64 version of f return numerically different values for numerically equal inputs – for example the float32 might overflow and return +Inf where the float64 version doesn’t.
If we put f(x) behind a generic cache, and that cache used numeric equality to check whether there was a cached result for some value of x, then f(b) might start returning +Inf for float64 values that should not overflow. This would change program behavior and could introduce hard-to-find bugs.
Example: Testing
Assertions in test code should generally use functional equality tests. First because it’s generally better for an automated test to be very specific about what the expected results are. But also because mistakenly using semantic equality checks in test code can cause incorrect behavior.
Consider a unit test that looks like this (in no particular language).
result = foo()
assert(result == 1) // this passes
// assert(f(result) == true) // this fails for some reason
assert(f(1) == true) // but this passes
The test originally failed on the assertion f(result) == true. But when the programmer commented that out the failed assertion and replaced it with the assertion f(1) == true, the test passed! But this is confusing, because the previous line of code just confirmed that result == 1.
But of course, the problem is that the programmer used the == operator, which is checking for semantic equality, not functional equality. result might, for example, be a float (1.0), while 1 is an int, and it may be that f(1) != f(1.0).
Example: Programmer Intuition
Here’s another example where a programmer’s intuition might expect that equality implies substitutability:
if result == Success {
// return Success // This breaks for some reason
return result
}
If result == Success, why should it matter whether I return result or return Success?
But if == is not a functional equality test, then it can matter (e.g. the result might have more information than just success/fail status code). If the programmer doesn’t understand that, they can easily make the wrong choice.
Common Approaches to Equality Testing
There are a few common ways that programming languages enable both functional and semantic equality tests.
By far, == is the most common choice for the equality operator in programming languages. And in most languages, == defaults to a functional equality test when comparing two values of the same type.
However, some languages allow customizing or overloading == so that it performs a semantic equality check instead.
Further, in some languages == can be used to compare two values of different numeric types. Some other languages also support comparing “truthy” values of different types (e.g. 1 == True) or comparing various types of values to strings (e.g. 13 == "13"). This is generally done using implicit conversion or widening (e.g. convert both values to a float or a string before comparing).
Languages such as Javascript/Typescript have a more complex Abstract Equality Comparison algorithm that even allows semantic equality between strings, numbers, and even arrays and objects (e.g. [] == 0).
For languages where == is not strict functional equality, === is often used as the strict functional equality operator.
Example: Go
In Go, == is a strict functional equality operator. The == operator can’t be overloaded, and the compiler won’t even allow values of different types to be compared.
To test for numeric equality across types, programmers must explicitly convert values to a common type:
var a int32 = 2+2
var b float64 = 4.0
// compile error
// a == b
// true
float64(a) == b
Example: Python
Python on the other hand allows equality comparison across values of any type. Numeric values are implicitly converted to a common type:
>>> a = 2+2
>>> type(a)
<class 'int'>
>>> b = 4.0
>>> type(b)
<class 'float'>
>>> a == b
True
“Truthy” values of different types can also be equal:
>>> 1 == True
True
>>> 1 == "Blue"
False
Normalization on Initialization
Many languages facilitate creation of canonical types via normalization of values on initialization. For example, in Scala you can define a custom apply function:
case class Fraction(numerator: Int, denominator: Int)
object Fraction {
def apply(n: Int, d: Int): Fraction = {
// Assume normalization logic here (compute GCD, reduce, handle signs)
val reducedN = /* normalized numerator */
val reducedD = /* normalized denominator */
Fraction(reducedN, reducedD)
}
}
// Usage
val a = Fraction(2, 4)
val b = Fraction(1, 2)
println(a == b) // True, as both normalize to the same values (e.g., 1/2)
By enforcing that fractions are always in normal form, two functionally equal Fraction values will also be numerically equal, and vice versa.
Canonical Number Types
Few languages have a single canonical number type. This is hard to achieve without sacrificing performance and constraining the range of numeric values that can be represented.
JavaScript, TypeScript, and pre-5.3 Lua have a single number type (double-precision floats). Perl represents numbers using a unified scalar interface with internal optimizations using different representations (ints, floats, and decimal strings). These languages often trade simplicity for performance limitations, and constrain the range of numbers that can be represented exactly (e.g., separate types or pragmas are required for bigints or exact rationals).
Scheme, Common Lisp, Julia, and several others unify numbers under a “numeric tower” or hierarchy that includes ints, floats, and exact rationals. These approximate canonicity through promotion rules but the different types still have differing arithmetic semantics and runtime checks that reveal the type.
Number types differ not only in the range or precision of values that can be represented, but also in the actual semantics of arithmetic operations. A single canonical number type would need to somehow decouple representations from arithmetic, requiring programmers to specify the semantics of their arithmetic operations (integer, float, decimal float, or exact rational), along with precision and overflow rules, instead of letting these be determined by the type. This may be unrealistic.
Comparison Across Languages
This table summarizes the rules for several popular programming languages
| Language | Cross-Type == Comparison | Functional Equality Test | Equality Operator Overloading | Normalization on Initialization | Canonical Numbers |
|---|---|---|---|---|---|
| Go | No | == | No | No | No |
| JavaScript | Abstract Equality Comparison algorithm | === | No | No | Yes (Number as doubles) |
| TypeScript | Abstract Equality Comparison algorithm | === | No | No | Yes (Number as doubles) |
| Python | Numbers and Truthy Values | type(x) == type(y) and x == y | Yes (eq) | Yes (init for normalization) | No |
| Haskell | Yes (typeclass) | == (User-Defined via Eq) | Yes (typeclass) | Yes (newtypes for wrapping) | Partial (numeric tower) |
| Swift | No | == | Yes (Equatable protocol) | Yes (init for normalization) | No |
| Kotlin | Numbers | === | Yes (override equals()) | Yes (init for normalization) | No |
| Julia | Numbers | === | Yes (methods) | Yes (constructors) | Partial (numeric tower) |
| Java | No | == | No | Yes (constructors) | No |
| Scala | Numbers | x.getClass == y.getClass && x == y | Yes (override equals) | Yes (apply in companion object) | No |
| Rust | No | == (via PartialEq/Eq) | Yes (impl trait) | Yes (newtypes/structs) | No |
| C++ | No | operator== (User-Defined) | Yes (operator==) | Yes (constructors) | No |
| C# | No | == | Yes (operator ==) | Yes (constructors) | No |
| Ruby | Numbers | self.class == other.class && self == other | Yes (override ==) | Yes (initialize for normalization) | No |
| Perl | Numbers | $x == $y (numeric context) | No (== is built-in) | No | Yes (unified scalars) |
| Scheme | Numbers (via =) | eqv? | No | Yes (rationals only) | Partial (numeric tower) |
| Lua (pre-5.3) | N/A (single number type) | == | Yes (__eq metamethod) | No | Yes (uniform doubles) |
Summary
If you think about it, any two things are the same in some ways, but different in other ways. So if you want to know whether two values are “equal,” you must first ask yourself what do you mean by equal?
Two values are functionally equal iff substituting one for the other won’t change program behavior—meaning it’s never true that $f(x) \neq f(y)$ for any $f$.
Semantic equality is looser: the integer 4 and float 4.0 represent the same numeric value, but if toString(4) differs from toString(4.0), then 2 + 2 isn’t functionally equal to 4.0.
A single semantic value can have multiple representations. Functional equality compares representations, semantic equality compares meaning. Choose a domain-tailored canonical representation—like sorted lists for sets or epoch seconds for times—and semantic equality tests reduce to functional equality tests on canonical representations. Make == a strict functional equality test to force programmers to be explicit about their intent, increasing clarity and killing bugs at the expense of convenience. Embrace canonical types to merge functional and semantic equality completely.