Amdahl, Gustafson, coding agents, and you (opens in new tab)  🔩Systems Programming

In the software operations world, if your service is successful, then eventually the load on it is going to increase to the point where you’ll need to give that services more resources. There are two strategies for increasing resources: scale up and scale out.

Scaling up means running the service on a beefier system. This works well, but you can only scale up so much before you run into limits of how large a machine you have access to. AWS has many different instance types, but there will come a time when even the largest instance type isn’t big enough for your needs.

The alternative is scaling out: instead of running your service on a bigger machine, you run your service on more machines, distributing the load across those machines. Scaling out is very effective if you are operating a stateless, shared-nothing microservice: any machine can service any request. It doesn’t work as well for services where the different machines need to access shared state, like a distributed database. A database is harder to scale out because the machines need to share state, which means they need to coordinate with each other.

Once you have to do coordination, you no longer get a linear improvement in capacity based on the number of machines: doubling the number of machines doesn’t mean you can handle double the load. This comes up in scientific computing applications, where you want to run a large computing simulation, like a climate model, on a large-scale parallel computer. You can run independent simulations very easily in parallel, but if you want to run an individual simulation more quickly, you need to break up the problem in order to distribute the work across different processors. Imagine modeling the atmosphere as a huge grid, and dividing up that grid and having different processors work on simulating different parts of the grid. You need to exchange information between processors at the grid boundaries, which introduces the need for coordination. Incidentally, this is why supercomputers have custom networking architectures, in order to try to reduce these expensive coordination costs.

In the 1960s, the American computer architect Gene Amdahl made the observation that the theoretical performance improvement you can get from a parallel computer is limited by the fraction of work that cannot be parallelized. Imagine you have a workload where 99% of the work is amenable to parallelization, but 1% of it can’t be parallelized:

Loading more...

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help