In Cyberpunk 2077, players noticed that the 3D models of children were just those of adults scaled down in size1. These uncanny figures roamed Night City with unexpected body proportions and aged facial structures. You know, some things don’t scale down naturally: a mobile app is not just a smaller website, a short story is not just a shorter novel, and a bathtub is not just a mini pool.
Political Science at a nanoscale
Similarly, political science at a nano-scale breaks down as well. How exactly does it “break down” and when must we draw a line between the “macro” and “nano” governments?
I admit, I’ve set up a problem that I have no business probing into, since I am no authority on this topic. My interests are strictly in inventing mathematical formulations for the sake…
In Cyberpunk 2077, players noticed that the 3D models of children were just those of adults scaled down in size1. These uncanny figures roamed Night City with unexpected body proportions and aged facial structures. You know, some things don’t scale down naturally: a mobile app is not just a smaller website, a short story is not just a shorter novel, and a bathtub is not just a mini pool.
Political Science at a nanoscale
Similarly, political science at a nano-scale breaks down as well. How exactly does it “break down” and when must we draw a line between the “macro” and “nano” governments?
I admit, I’ve set up a problem that I have no business probing into, since I am no authority on this topic. My interests are strictly in inventing mathematical formulations for the sake of it, to see where it leads, to experiment in a sandbox of the mind, and if lucky, to gain new insights on social dynamics.
So before we touch any equations, here are a couple of examples of how small groups of people often organize themselves into familiar patterns:
- A friend planning a BBQ might appear as a benevolent autocrat, where one leader orchestrates the evening.
- Classmates working on a group project may resemble an oligarchy, where a few decide the plan and carry it out: “he’ll be late, so let’s start without him.”
- Coworkers picking a place to eat tend to vote, like a direct democracy (aside: RIP Foursquare2.)
Can we predict how a stranded crew of astronauts on Mars would organize themselves? Assume, of course, that all communication with the mainland has been cut off (say, due to software that cannot adapt to change3).
Let’s use Rousseau’s The Social Contract4 as a starting point, keeping in mind that it is by no means on par with modern political philosophy. He introduces language to describe governments, while warning us about the limitations of “mathematical precision.”
“…although I have borrowed … the language of mathematics, I am still well aware that mathematical precision has no place in moral calculations” (105).
Foolishly, we’ll proceed anyway.
Will of the people
Let’s represent a person’s will by a vector (\vec{p}), in some arbitrary vector space (say (\mathbb{R}^{3}) for simplicity).
Now, if LLMs have shown us anything, it is that vector embeddings are a good enough mechanism to store natural language. If a body of text can be embedded in a semantic vector space, then we can borrow that manifold for our purposes too, because we assume the will is legible through written text, much like the letter of the law, with the same limitation: it strips away the contextual information behind whatever produced the text.
Anyway, back to a couple more definitions. The list of individual wills is (P = (\vec p_1,\dots,\vec p_n)).
A legitimate government aligns with the will of the people. For it to function, someone needs to carry out the administration and enforcement duties.
Introducing the magistrates
Rousseau calls this role the magistrate: the “man or body charged with that administration” (102).
Thus, we divide the public into two groups of wills:
- the wills of the sovereign (S) (those who make the laws)
- the wills of the magistrates (M) (those who administer and enforce the laws)
Define the vector embedding such that the will of the public is the mean will of the population:
[ \style{color:#ee854a;}{\vec{\rho}} = \frac{1}{n} \style{color:#797979;}{\sum}_{i=1}^{n} \vec{p}_i. ]
Likewise, the sovereign will is written (\style{color:#d65f5f;}{\vec{\sigma}}), and the magistrates’ will is (\style{color:#4878d0;}{\vec{\mu}}).
In the demo below, try adding a few magistrates and observe how that affects the variance within a group and the distance between groups. You may intuit variance as “friction to get anything done” and distance as “conflict of interest between the two bodies.”
There are many more ways to partition (P) into (M) and (S) than shown in the demo, on the order of (2^n) to be more precise. So, we treat them as random variables over all partitions, such that:
[ \style{color:#797979;}{\mathbb{E}}[\style{color:#d65f5f;}{\vec{\sigma}}] = \style{color:#ee854a;}{\vec{\rho}}, \qquad \style{color:#797979;}{\mathbb{E}}[\style{color:#4878d0;}{\vec{\mu}}] = \style{color:#ee854a;}{\vec{\rho}}. ]
Microfoundation
It sounds like what you’d call the backbone of a conspiracy theory, or if you ask my wife, a tiny bit of makeup to even out the complexion of a face. Not quite. We’re actually talking about “microeconomic foundations”: you can ground large-scale behavior of a group in individual-level behavior5.
For example, as if the will of a person wasn’t already a complex enough topic, the will of a group is even more controversial. Instead of averaging all the wills of people in a group to predict how they would act, let’s zoom in to the individuals.
Applying microfoundations, the voice of the sovereign (\style{color:#d65f5f;}{X_S}) at any time can be estimated through turnout or sampling. The voice of the magistrates (\style{color:#4878d0;}{X_M}) at any time can be estimated by a few delegates, or a random sample.
(\style{color:#d65f5f;}{X_S}) is a noisy signal of (\style{color:#d65f5f;}{\vec{\sigma}}), and (\style{color:#4878d0;}{X_M}) is a noisy signal of (\style{color:#4878d0;}{\vec{\mu}}):
[ \style{color:#797979;}{\mathbb{E}}[\style{color:#d65f5f;}{X_S} \mid S] = \style{color:#d65f5f;}{\vec{\sigma}}, \qquad \style{color:#797979;}{\mathbb{E}}[\style{color:#4878d0;}{X_M} \mid M] = \style{color:#4878d0;}{\vec{\mu}}. ]
Formulation of a government
A government’s will (\style{color:#956cb4;}{\vec{\gamma}}) is a tug-of-war between the voice of the sovereign (\style{color:#d65f5f;}{X_S}) and the voice of the magistrates (\style{color:#4878d0;}{X_M}). It lives somewhere between the two points. Given some (t \in \mathbb{R}) where (0 \leq t \leq 1), define
[ \style{color:#956cb4;}{\vec{\gamma}}(t) = t \style{color:#4878d0;}{X_M} + (1 - t) \style{color:#d65f5f;}{X_S}. ]
In the demo below, hover over a partition to see a visual representation of (\style{color:#956cb4;}{\vec{\gamma}}(t)).
The government’s will (\style{color:#956cb4;}{\vec{\gamma}}(t)) is also an unbiased estimator of the public will:
[ \begin{aligned} \style{color:#797979;}{\mathbb{E}}[\style{color:#956cb4;}{\vec{\gamma}}(t)] &= \style{color:#797979;}{\mathbb{E}}!\left[, t \style{color:#4878d0;}{X_M} + (1-t) \style{color:#d65f5f;}{X_S} ,\right] \ &= t,\style{color:#797979;}{\mathbb{E}}[\style{color:#4878d0;}{X_M}] + (1-t),\style{color:#797979;}{\mathbb{E}}[\style{color:#d65f5f;}{X_S}] \ &= t,\style{color:#797979;}{\mathbb{E}}!\left[\style{color:#797979;}{\mathbb{E}}[\style{color:#4878d0;}{X_M} \mid M]\right] + (1-t),\style{color:#797979;}{\mathbb{E}}!\left[\style{color:#797979;}{\mathbb{E}}[\style{color:#d65f5f;}{X_S} \mid S]\right] \ &= t,\style{color:#797979;}{\mathbb{E}}[\style{color:#4878d0;}{\vec{\mu}}] + (1-t),\style{color:#797979;}{\mathbb{E}}[\style{color:#d65f5f;}{\vec{\sigma}}] \ &= t \style{color:#ee854a;}{\vec{\rho}} + (1-t)\style{color:#ee854a;}{\vec{\rho}} \ &= \style{color:#ee854a;}{\vec{\rho}}. \end{aligned} ]
We define the cost of a government as the mean squared error relative to the public will:
[ \style{color:#797979;}{\mathrm{Cost}}(t) = \style{color:#797979;}{\mathbb{E}}\big[ | \style{color:#956cb4;}{\vec{\gamma}}(t) - \style{color:#ee854a;}{\vec{\rho}} |^2 \big]. ]
By the bias-variance decomposition for squared loss:
[ \style{color:#797979;}{\mathrm{Cost}}(t) = \underbrace{\big| \style{color:#797979;}{\mathbb{E}}[\style{color:#956cb4;}{\vec{\gamma}}(t)] - \style{color:#ee854a;}{\vec{\rho}} \big|^2}_{\text{illegitimacy}} ;+; \underbrace{\style{color:#797979;}{\operatorname{tr}}\big(\style{color:#797979;}{\operatorname{Cov}}(\style{color:#956cb4;}{\vec{\gamma}}(t))\big)}_{\text{ineffectiveness}}. ]
So minimizing (\style{color:#797979;}{\mathrm{Cost}}(t)) means choosing a government that is both:
- legitimate (aligned to the public will)
- effective (not unstable due to noise).
Since (\style{color:#797979;}{\mathbb{E}}[\style{color:#956cb4;}{\vec{\gamma}}(t)] = \style{color:#ee854a;}{\vec{\rho}}), the cost simplifies to just the variance term
[ \style{color:#797979;}{\mathrm{Cost}}(t) = \style{color:#797979;}{\operatorname{tr}}\big(\style{color:#797979;}{\operatorname{Cov}}(\style{color:#956cb4;}{\vec{\gamma}}(t))\big). ]
Assuming the two bodies’ estimation errors are uncorrelated, by the covariance rule for a linear combination6, and the linearity rule of a trace7, we get
[ \begin{aligned} \style{color:#797979;}{\mathrm{Cost}}(t) &= t^2 \style{color:#4878d0;}{v_M} + (1 - t)^2 \style{color:#d65f5f;}{v_S} \ \text{where}\quad \style{color:#4878d0;}{v_M} &= \style{color:#797979;}{\operatorname{tr}}(\style{color:#797979;}{\operatorname{Cov}}(\style{color:#4878d0;}{X_M})), \ \style{color:#d65f5f;}{v_S} &= \style{color:#797979;}{\operatorname{tr}}(\style{color:#797979;}{\operatorname{Cov}}(\style{color:#d65f5f;}{X_S})). \end{aligned} ]
The optimal government is at (\frac{d \style{color:#797979;}{\mathrm{Cost}}}{dt} = 2 t \style{color:#4878d0;}{v_M} + 2(t - 1)\style{color:#d65f5f;}{v_S} = 0), or simply
[ t = \frac{\style{color:#d65f5f;}{v_S}}{\style{color:#d65f5f;}{v_S} + \style{color:#4878d0;}{v_M}}. ]
Neat, we’ve derived the phenomenon that a body with more internal disagreement is less effective! This can also be interpreted as: the higher the variance of a body, the less influence it has on the government. The influence of the individual diminishes as the population increases. The inverse is true as well, which leads to Rousseau’s conclusion that a small magistrate rules with more power.
“…government slackens to the extent that the magistrates are multiplied…” (109).
Comparing governments
The best government carries out actions swiftly in the interest of the people. In expectation over partitions, (\style{color:#956cb4;}{\vec{\gamma}}) is in the interest of the people, because it achieves the equilibrium point between the magistrates and the sovereign. But, the effectiveness of carrying out actions is largely a property of the magistrates. Let (w_M) be defined as the within-group variance of the magistrates, which is interpreted as the “friction” among the magistrates to get anything done.
The loss function of a government is a function of alignment (distance to (\style{color:#ee854a;}{\vec{\rho}})) and effectiveness ((w_M)). Let (\lambda) be a hyperparameter. Then, we define the loss as
[ \style{color:#797979;}{\mathcal{L}}(\style{color:#956cb4;}{\vec{\gamma}}) = | \style{color:#956cb4;}{\vec{\gamma}} - \style{color:#ee854a;}{\vec{\rho}} |^2 + \lambda w_M. ]
In the demo below, we can better see the intuition that having too many chefs in the kitchen (i.e. too many magistrates) will lead to suboptimal governments.
After playing with the demo above for long enough, you may start to notice that the loss function of governments with 15+ people all end up looking practically the same. It’s always just a line sloped slightly up and to the right. In fact, most of the time the optimal government is found when the magistrates make up only a tiny fraction of the population.
Whereas for a population count of 9 or fewer, we instead see that the local optima occur when about half of the population is part of the magistrates.
Regime split
Most notably, the optimal fraction of the population that should be part of the magistrates is not a smooth transition from nano governments to macro governments. This brings us back to our initial point: a nano government is not simply a smaller government. There is a clear split between the two.
In the demo below, the x-axis is the population size, and the y-axis is the ratio of magistrates that make up the optimal government.
There are two regimes! Instead of a smooth transition, we see a clear boundary between the small and the big! This is our first evidence that nano governments are a beast of their own.
Note, the term regime in the context of statistical models was first introduced to me by my Ph.D. advisor Song-Chun Zhu8. Coincidentally, and maybe even obviously, regime also has a political meaning, which a stronger writer than myself would avoid using to squash confusion.
Discussion and further study
The catalyst for writing this post was my surprise at reading Rousseau’s argument that a bigger group of magistrates is not more powerful. After operationalizing the claim, I think I finally understand what he meant. Moreover, as a fun consequence of this exploration, we see that there’s a fascinating regime split between the nanoscale and the traditional scale.
From the demos above, I’ve noticed that optimal nano governments usually take the form of dipoles, or two separate clusters. Like a bow-tie, it has dense clusters on either side, joined at a point in the middle. Surprisingly, in this regime, the sovereign and the magistrates can have completely opposite wills, but as long as both bodies are dense, the government will ends up near the population will, while keeping within-body variance of the magistrates low.
There’s more to investigate:
- In this article, we used (\mathbb{R}^3) because it’s easier to visualize. How does the nano government regime boundary change as the dimension increases?
- How can we train an embedding model such that the will of the population is indeed simply the arithmetic mean of the vectors?
- What systematic ways are there to identify an optimal value of the hyperparameter (\lambda)?
- What if an individual is a member of both the magistrates and the sovereign? How does that affect the formulation and regime boundary?
- Does any of this theory hold up in the real world?
- Why is there a regime split?
- We made it this far without mentioning any of the tenets of game theory. What makes the study of nano governments different from the study of game theory? After all, both deal with the analysis of agents.
I was hoping to finish this post a couple of weeks ago during the holidays. Early on, I knew Emacs would be my editor, Markdown my source, and Pandoc my compiler. The math was getting sloppy and hard to read, so I spent a few cycles on a Lua plugin to syntax highlight (\LaTeX). That honestly made a world of difference, so much so that I started color-coding all interactive demos as well.
Before LLMs, generating one-off experimental demos to analyze simulations meant subscribing to a framework (whether it be Jupyter Notebooks, Wolfram Mathematica, MATLAB, etc.). I was curious, and pleasantly surprised that JavaScript and Three.js were pretty much all I needed for small-scale experiments. Many demos were culled in the making of this post, so that the ones that survived the cut are worth your time.
Happy New Year!