Every field of human activity has its unique characteristics, and programming is no exception.
One of the unique aspects of software is how it spans such a large number of orders of magnitude. A software engineer may be slicing and dicing nanoseconds, or they may be trying to accelerate a computation that will run across thousands of cores for a month… and they may even be doing both at the same time!
A single core in a nanosecond may cover 4 cycles. A thousand cores in a month covers about 2,600,000,000,000,000,000 cycles. Rounding a touch, that’s a range of about 19 orders of magnitude. A large supercomputer cluster, or if you choose to count GPU cores differently, may stretch even another couple of orders of magnitude.
This is not an every day experience for most programmers, …
Every field of human activity has its unique characteristics, and programming is no exception.
One of the unique aspects of software is how it spans such a large number of orders of magnitude. A software engineer may be slicing and dicing nanoseconds, or they may be trying to accelerate a computation that will run across thousands of cores for a month… and they may even be doing both at the same time!
A single core in a nanosecond may cover 4 cycles. A thousand cores in a month covers about 2,600,000,000,000,000,000 cycles. Rounding a touch, that’s a range of about 19 orders of magnitude. A large supercomputer cluster, or if you choose to count GPU cores differently, may stretch even another couple of orders of magnitude.
This is not an every day experience for most programmers, but even an 8-core 4GHz system covers 32,000,000,000 cycles in a second. Again rounding a bit that’s 10 orders of magnitude between “my code runs in a couple of cycles” and “my code takes all my CPU resources for a second”.
I can not think of very many other disciplines that not only span that number of orders of magnitude, but are doing engineering across the entire range. Cosmology may care about quantum mechanics in order to try to determine the behavior of things like neutron stars, but there’s a vast swathe in the middle they don’t cover. Other ideas may leap to your mind, but even 10 orders of magnitude turns out to be really quite a bit! Thinking about things of one size in one moment, then something a billion times bigger or smaller, and caring about both of them and potentially also a range of magnitudes in between, is not common.
And as such, our human brains are not very good at dealing with this. We do not have English terminology that can account for systems that span this range. Is something that takes 500 cycles “fast?” 50,000? 5 million? 50 million? To a human, all but perhaps that last one are equally “instant”. Then again, try to do them a billion times each and the differences become quite marked.
“Fast” and “slow” are often not very useful words in software engineering because of this broad range of orders of magnitude. Imagine trying to draw the line between “fast” and “slow” in some sort of general sense across 19 orders of magnitude, in a world that generally experiences performance in a very linear manner. You can’t.
The proximal reason for this post is the general idea that “In Go, using cgo is ‘slow’ and therefore should be avoided.” You might expect me to now go into defending Go against the “accusation”, but the issue I’m addressing here isn’t about its performance. Its performance is what it is and no amount of jawboning from me will change anything.
The question I’m talking about here is, is “slow”, as a word standing by itself with no further characterization, even an applicable concept?
Web Frameworks and HTTP Servers
Another example that I see a lot is developers coming along and analyzing which web framework they should use. So many developers only seem to see those “requests per second” numbers and analyze their choices solely in terms of what is fastest.
However, let me observe that frameworks that can, say, handle 10,000 requests per second with a slightly non-trivial task enough to establish that the framework is doing something, on reasonably available hardware, are a commodity now. You have to go to the 493rd slot to get down that far. Before worrying about whether you need the framework that can handle ten thousand requests per second or a million requests per second, you need to ask whether the code you’re going to run in each request is itself capable of handling more than 10,000 requests per second.
The odds are that it’s not. Assuming perfect parallelism for simplicity, 10,000 requests per second is one-tenth of a millisecond times the number of cores in your server that each request has to play with. Call it a maximum of 5 milliseconds to account for some overhead. It is completely normal for web requests to need more than 5 milliseconds to run. If you’re in a still-normal range of needing 50 milliseconds to run, even these very slow frameworks are not going to be your problem. Even if you had a framework where the request overhead was a flat zero, you’d still not be able to process more than 20 requests per second per core at 50ms per request.
I’m counting full CPU utilization. You may object that you’re not doing that, in which case by all means, insert your own numbers as appropriate. These numbers are only intended as examples to warm you up. On the other hand, you’d be surprised how quickly things can stack up until you really are doing 50ms of full-CPU work in a web handler1.
Most people, most of the time, doing most web work, are so thoroughly outclassed on speed by their web framework and server that the speed of their choice is irrelevant. Which means they should be selecting based on all the other relevant features. The very fastest choices are often the very fastest choices precisely by virtue of optimizing on that and leaving out all the other ways a framework can make your life easier.
Of course, it is worth taking a moment at the beginning of a web project and thinking this through… do you, in fact, have a system where you need to answer hundreds of thousands of queries per second, continuously, and you can in fact write a handler that is fast enough to keep up with that?
Then by all means, take that into account in selecting your stack!
I think it is more common for developers to obsess over irrelevant performance details and losing more time programming in a suboptimal environment for what they need, but it is more consequential when developers make the opposite mistake and choose something that they should have known from the very beginning would not be able to meet their performance needs. Generally this is only conclusively demonstrated after vast work has been poured into the inadequate solution and solving the problem is very difficult. Both mistakes can be very expensive (don’t forget the opportunity cost of all the time lost when working with an excessively “performance” focused platform when the convenience would have saved a lot of developer-hours and calendar time), but it’s the second one that involves ramming into a wall very, very late into the development cycle, often only revealing itself after multiple full releases.
Database Implementation Languages
My personal favorite example of this is probably people implementing databases, especially commercial databases. Databases compete in a space where every nanosecond counts, because every nanosecond is going to be repeated trillions and quadrillions of times. If you want to create a commercially-viable database, you need to be thinking from the moment you select your implementation language about how you are going to optimize your code.
But for some reason it has been somewhat popular of late to pick Go as the implementation language. I consider this a poor choice. As I like to say, Go is generally the slowest member of the fastest class of languages, the static compiled ahead-of-time languages. Considered on the entire landscape of langauges, Go is pretty fast. Considered in that set of langauges that let you count nanoseconds and exert super-deep and detailed control of your code, it has some notable weaknesses. These weaknesses are often overstated in other contexts, but in this context, every last one of them matters.
Sometimes certain language communities that see themselves in competition with Go, most notably Rust, see stories of database vendors or other super-high-performance people switching away from Go as a sign that Go can’t handle the very top end of tasks… and there is some truth to that view. However, what I also see in the vast majority of those situations is a team that should have known better from the beginning and should never have started out in Go.
But there’s a lot of programming tasks in the world where no one is going to spend thousands of hours optimizing every loop in the system. In that case, we end up falling back on the “Go is pretty fast” situation.
Is Go “fast”? Is Go “slow”? Both English words obscure facts of the situation that an engineer selecting a language must be paying attention to, if they are going to make an informed decision.
Premature Optimization yada yada yada
Everyone knows that “premature optimization is the root of all evil” (and the quote goes on but for my point today that wouldn’t change anything) but the conversation about what is or is not premature optimization is a complicated one. For myself, I find it very effective on the cost/benefits analysis just to keep in mind the rough order of magnitude size of the operations I’m performing.
If I want to add lots of numbers together, which is roughly at the smallest end of our magnitude span, and I am getting those numbers via an HTTP REST request to a resource on the other side of the world, then I know that in terms of the final performance of the system, the addition is completely lost in the noise. I can essentially neglect the addition portion and I focus on making sure the HTTP API request has enough performance to meet my needs, and I know that things like “doing the addition two or three times” is inconsequential next to “needing to make a dozen requests rather than one”.
I know that parsing numbers, which is a couple of orders of magnitude slower than simply adding them but still much, much faster than my HTTP API request, is not relevant, so the costs of parsing are almost certainly also completely neglectable. By contrast, if I want to move lots of numbers around the world, considering how they are encoded on the wire may be very important… if I know I have only a relatively low-bandwidth link I may want to look into compression, even some very CPU-intensive compression, rather than spending any time optimizing the parsing routines.
On the other hand, if I’m interacting with an HTTP API that is in the same rack, now the speed of the API call is such that maybe I do need to think about serialization costs. Here we find the many systems where something like JSON serialization speed does become important (and maybe JSON is not the correct choice for such systems).
If I’m doing something more complicated than “adding a bunch of numbers” together, that can also change the balance. If I’m “sticking them in an associative array” of some sort, well, that’s slower than addition, but it’s pretty fast for small structures. Then again, if I know I’m going to have many millions of such things, I know I’m going to be working at RAM speeds instead of CPU cache speeds, and whether or not that’s a problem depends on what else I’m doing.
A characteristic of these systems spanning so many orders of magnitude is that it is very frequently the case that one of the things your system will be doing is in fact head-and-shoulders completely above everything else your system should be doing, and if you have a good sense of your rough orders of magnitudes from experience, it should be generally obvious to you where you need to focus at least a bit of thought about optimization, and where you can neglect it until it becomes an actual problem.
A further consequence of that fact is that in our world, sometimes little things don’t add up. When you live in a world where “things” tend to span, say, two or maybe three orders of magnitude at once, the “little things add up”. It doesn’t take too many seconds to be repeatedly added to something that takes a couple of minutes before those extra seconds are a significant proportion of the job. By contrast, if a process is going to take a few minutes, a few nanoseconds here and a few nanoseconds there may in fact not add up to anything significant.
The minimum noticeable amount of nanoseconds is in the tens of millions, and to notice them slowing down a multi-minute process the threshold is a few orders of magnitude higher than that. It is very easy to be looking at some code and see something that will over the course of the entire computation waste many thousands of nanoseconds… but that’s not relevant. It does not, in fact, “add up” to anything. What is wisdom in one context can be folly in another.
A lot of “premature optimization” in the real world is obsessing over nanoseconds while neglecting the milliseconds you could have been saving with the same effort.
In Conclusion
Many times over the years, I’ve been in a meeting where people from various teams are having some discussion with each other. They don’t ever define their terms. They have an extensive discussion and come to some conclusion that everyone agrees with.
Two weeks later we all conclude our work and try to put the pieces together, and it turns out they don’t fit together.
Everyone thought they agreed with each other, but it turns out everyone was using different definitions for terms. If they had more carefully explained what they were saying to each other, they would have discovered they were not in sync after all.
“Fast” and “slow” are great examples of the sorts of terms that lead to this result. Everyone will agree that the code needs to be “fast” and “not slow”. But one team may mean that they can make ten million queries per second and the other team may be thinking about being able to complete a particular query in under ten milliseconds. When the second team proudly presents their “fast” API to the first, which does indeed complete the queries quickly but can’t be scaled beyond ten thousand queries per second without an architectural overhaul, now you’ve got the sort of problems that pushes projects back months or gets them cancelled.
I would suggest to the reader that you try to abstain from describing things as “fast” or “slow” in the programming world, strike the terms from your vocabulary, and always try to be more specific about how the thing in question is fast or slow, against what metric, against what competition, on what order of magnitude of time the thing operates on. The bare terms are often not only useless, but because of their ability to convince a bunch of people who don’t agree that they do, often of negative value.
On the other hand, have you profiled your web services lately? There’s a lot of propaganda floating around about how all web requests are bound by DB latency, but I have repeatedly found in my career that this is often not the case, even before solid state drives. If you have never taken a profile of your web services, or don’t have a recent one, you may be surprised at how much real CPU is being used, not just waiting. ↩︎