Second-System Syndrome

Bruce and I have recently put together a proposed outline for the 7th Edition of our main textbook. The result is 16 chapters long. By comparison, the previous editions held steady at nine chapters, even after multiple updates through the years. My first reaction after seeing the outcome of the planning exercise was concern, and I commented to Bruce that I was worried we were suffering from Second-System Syndrome. Not coincidentally, one of the sparks that got us moving on a new edition was a question to Bruce at SIGCOMM asking what we would do differently considering how much more complex the Internet is today than it was 30 years ago when we published the first edition. It would seem we set ourselves up for a cla…

What we are planning is ambitious; it could fairly be called a generational revision. That the time is right for such a reset is what I heard in the question at SIGCOMM (intended or not), and I’m quite happy with what we have so far. I have first-hand experience with Second-System Syndrome (more on that below) and that is not what the planned revision is about. It is certainly true that a lot has changed since the first edition was published, but incrementally updating a book—like incrementally adding new features to a software system—is what leads to complexity and bloat. Sometimes, a refactoring is exactly what’s needed in order to simplify. That’s what we’re trying to do with the book. We still have a *lot *of work in front of us, but I’m optimistic the outcome will be an improvement. And I’m willing to stipulate that the final product will be no longer than earlier editions.

If refactoring a system (or book) helps to simplify it, and incrementally adding features to an existing system eventually leads to bloat and complexity, then what is Second-System Syndrome about? The term was originally coined by Fred Brooks in his book “The Mythical Man Month”, largely based on his experience building the IBM 360 (the first system) and OS/360 (the second system). I recommend reading Brooks’ book for his perspective, but my own experience was with PlanetLab (the first system) and GENI (the second system). There are many lessons wrapped up in the story of GENI—many having to do with the perils of large community-driven initiatives chasing even larger government dollars—but it is also a textbook example of the technical challenges of building a “bigger and better” second system as a follow-on to a successful, but simple, first system.

Brooks’ essay doesn’t say much about the causes of the second-system effect. His main contribution was to identify the problem, which is now often equated with over-engineering, feature creep, and bloat. But in my view, it is easy to miss the point. Over-engineering is the mistake of adopting a more complex solution than a problem calls for, but more often than not, that’s due to poor judgement about the problem being solved rather than what solution to apply. Feature creep and bloat are what happens when you incrementally grow a system to meet new requirements. I could make a case that fear (or expense) of a clean-slate design is what often leads to bloat; not the ambition of building a second system. What all of these interpretations get right is that the problem can be attributed to inflated expectations. Or the way I think about it, the problem is that we take on more requirements in an attempt to build a system that solves more problems, presumably in an attempt to attract more users or customers. But in doing so, we risk a less useful outcome.

From PlanetLab to GENI

This is exactly what happened with GENI. PlanetLab had a simple goal of giving researchers best-effort containers on a widely distributed set of servers. That proved useful to a lot of people wanting to experiment with planetary-scale services. We had to put a few protections in place to deal with the tragedy of the commons—the most ingenious of which was to kill the job consuming the most memory whenever a server ran short—but we never tried to satisfy calls for resource guarantees or a more sophisticated reservation system. GENI, on the other hand, adopted a mission to support a much broader user community, including, for example, researchers wanting to run measurement experiments. That use case brought additional requirements to the table, including the need to eliminate the variability of best-effort resource allocation. There were many other examples like this, which I talked about in another post.

We did end up with a design, although it was so complex—and the will to see the system built was so ephemeral—that it never happened in the way we imagined. The lesson I learned from the experience was that over-constraining a system is the actual cause of Second-System Syndrome, because it forces you to build a more complex framework to accommodate all the requirements. Being able to select just the right set of requirements to result in something useful is the challenge. The measurement researchers that PlanetLab did not adequately support eventually found a home on MeasurementLab, which essentially replicated PlanetLab, but then addressed variability with a more stringent admission control policy.

This experience was an outcome of building large systems, but a similar mindset applies to writing textbooks. You can’t make everyone happy, so you’d better be sure you know what audience you’re writing for, and focus on what’s most important to them. Refactoring is helpful because it makes it easier to emphasize what’s important and prune back what’s not. We have also found our willingness to not be encyclopedic is a feature, rather than a bug. And as humbling as it is, seeing comments from random people about your book—as recently happened when The Register republished my post about the new edition—can be helpful. The reminder is that not everyone has the privilege of being able to participate in the design and building of new systems; plenty of people are kept busy trying to master the intricacies of vendor-specific device configurations. Someone should write a networking book for those folks, but it probably won’t be us.

Being clear about who you are (and are not) writing for is the other insight into what we hope to accomplish in 7E. We plan to double down on the systems approach because our goal is not only to cover the important concepts and practices in networking, but also to show (through network-related examples) how computer systems are designed and built. For anyone who works in the software industry, mastering the latter is likely to be a requirement for participating in the design process. We aim to teach the next generation of students how to apply systems thinking both to networking and to the broader computing landscape they will encounter.

The preview image this week is a photo of the Baroque Joanine Library in Coimbra. Photo by Wirdung, CC BY-SA 3.0, via Wikimedia Commons.

As fans of architecture, we sadly note the passing of Frank Gehry. We will write something about our experience with his architecture in a later post.

As fans of the Fediverse we’re trying to make more use of Peertube. Here is Bruce’s SIGCOMM keynote.

Our security book has been out for a couple of weeks and we are already getting PRs to fix issues. Take a look and see if you can improve it.

From PlanetLab to GENI

Similar Posts