A Short History of Performance Engineering

A long time ago, in 2014, I did the “A Short History of Performance Engineering” Ignite talk at the Velocity conference (slides and video). A lot of new developments have happened since 2014, so here is an updated text version – incorporating feedback and new events.

Chronology

Performance Engineering has a rather long and fascinating history, especially if considered in the context of changing computing paradigms. While not everything that was done in the past may be literally applied to every new technology – the underlying principles often remain the same and knowledge of history allows not to re-invent the wheel when it is not necessary. Unfortunately, quite often statements referring to the past are not completely correct. Performance engineering history is not well known. So here are a few bits of information that appear interesting. The approach was to find the first mature appearance of still relevant performance concepts (without diving into further history of these concepts). It is not scientific research and not much information is available overall – so a lot of important information may be still missed. All opinions, of course, are my own.

We probably may list the following computing paradigms:

Mainframes: late 50s
Distributed Systems: late 70s
Web: mid 90s
Mobile, Cloud: mid 2000s
Artificial Intelligence: mid 2020s

(of course, AI appeared long time ago – but it is when it started to change the mainstream computing paradigm)

Performance expertise, related to a new paradigm, usually materializes later, when the technology matures.

Mainframes

Probably performance went beyond single-user profiling when mainframes started to support multiprogramming. In the early mainframe years, processing was concerned mainly with batch loads. Mainframes had sophisticated scheduling and could ration consumed resources. They also had very powerful OS-level instrumentation allowing the engineers to track down performance issues. The cost of mainframe resources was high; therefore, capacity planners and performance analysts were needed to optimize mainframe usage.

We definitely may say that performance engineering became a separate discipline when instrumentation was introduced with SMF (System Management Facilities), released as part of OS/360 in 1966 (still in use in mainframes).

In 1968, Robert Miller (IBM) in his “Response Time in Man-Computer Conversational Transactions” paper described several threshold levels of human attention. The paper was widely cited by many later researchers and mostly remains relevant until now.

In 1974, monitoring was introduced with RMF (Resource Measurement Facility), released as part of MVS. OMEGAMON for MVS by Candle (acquired by IBM in 2004), released in 1975, is claimed to be the first real-time monitor.

A performance community, the Computer Measurement Group (CMG), was created in 1974, holding annual conferences for a long time (still exists, although not focused only on performance).

In 1977, BEST/1 was released by BGS Systems (acquired by BMC in 1998), the first commercial package for computer performance analysis and capacity planning to be based on analytic models.

Distributed Systems

When the paradigm changed to client-server and distributed systems, available operating systems didn’t have much instrumentation and workload management capabilities. Load testing and system-level monitoring became almost the only ways to investigate multi-user performance. Deploying across multiple machines was more difficult and the cost of rollback was significant, especially for Commercial Off-The-Shelf (COTS) software, which may be deployed by hundreds or even thousands of customers. Thus, more need for performance design to make it right from the beginning.

“Fix-it-later was a viable approach in the 1970s, but today the original promises no longer hold, and fix-it-later is archaic and dangerous. The original premises were:

Performance problems are rare.
Hardware is fast and inexpensive.
It’s too expensive to build responsive software.
You can tune software later, if necessary.”

Have you heard something like this recently? That is a quote from Dr. Connie Smith’s Performance Engineering of Software Systems published in 1990. The book presented software performance engineering and already had 15 pages of bibliography on the subject.

Apparently, PreVue was the first commercially available load testing tool released in 1987 by Performance Awareness (acquired by rational in 1997). The most known load testing tool, LoadRunner, was released in 1991 by Mercury Interactive (acquired by HP in 2006, now OpenText Performance Engineering). JMeter, the most popular open-source load testing tool, was originally created by Stefano Mazzocchi of the Apache Software Foundation in 1998. For a while load testing became the main way to ensure performance of distributed systems and performance testing teams became the centers of performance-related activities in many organizations.

The Application Performance Management (APM) term was coined by Programart Corp. (acquired by Compuware in 1999) in 1992 (in the mainframe context – as a combination of their STROBE and APMpower tools). However, STROBE, which they refer to as an application performance measurement tool, was on the market since the 70s. Still, there is an opinion that the first APM tool as we know them now was Introscope by Wily Technology, founded by Lew Cirne in 1998 (acquired by CA in 2006).

The history of End-User Monitoring (EUM) / Real-User Monitoring (RUM) may be traced at least to ETEWatch (End-to-End Watch), an application response time monitor released in 1998 by Candle (acquired by IBM in 2004, then a part of Tivoli). Although the technology got popular later with the development of Web / mobile technologies.

Web / Mobile

Most of existing expertise was still applicable to the back end. First books applying existing expertise to the Web were published in 1998- for example, Web Performance Tuning and Capacity Planning for Web Performance.

In 2007, Steve Souders published High Performance Web Sites: Essential Knowledge for Front-End Engineers , stating that 80-90% of the user response time is spent in the browser and thus starting Web Performance Optimization (WPO), centered around the client side.

WPO Community was built around the Velocity conference (the first one was in 2008, last one in 2019) and Web Performance meetups. Velocity was a very popular performance conference – at least until Steve Souders stepped off as an organizer and O’Reilly announced merging Web Performance into the Fluent conference. Maybe it was an indication that WPO became more mature and got more integrated with other aspects of technology.

Mobile technologies supported further development of Web Performance – as client-side performance was even more important on mobile devices.

A very detailed article An Incomplete History of Web Performance by Tanner Hodges was published in Performance Calendar 2022, where you can find many more details about the Web Performance history. While there is some overlap with the current article, it had a somewhat different focus.

Site Reliability Engineering (SRE) was created by Google in 2003. It helped to promote some areas of performance knowledge and expertise in the industry (especially around Service Level Objectives / Indicators / Agreements), but performance is just a part of the long list of SRE responsibilities. Overall, SRE proliferation improved performance culture overall, but it is not clear how it impacted performance engineering as a separate discipline as it distracted a lot of attention from it.

Web brought centralization (amplified later by Cloud), which triggered transition to Agile software development – with Manifesto for Agile Software Development published in 2001. It had a profound impact on performance engineering and performance testing in particular, eventually introducing the need for continuous performance testing (although it happened much later).

Cloud

The next paradigm shift was to the cloud. While the term “cloud computing” was popularized when Amazon released its Elastic Compute Cloud in 2006, references to “cloud computing” appeared as early as 1996. Technologies get matured more quickly nowadays – for example, Amazon’s own monitoring solution CloudWatch was released only three years later, in 2009. Of course, many established performance products started to support cloud and quite a few new products entered the market.

While the cloud looks quite different from mainframes, there are many similarities between them, especially from the performance point of view:

availability of computer resources to be allocated,
an easy way to evaluate the cost associated with these resources and implement chargeback,
isolation of systems inside a larger pool of resources,
easier ways to deploy a system and pull it back if needed without impacting other systems.

However, there are notable differences that make managing performance in the cloud more challenging. First, there is no instrumentation on the OS level and even resource monitoring becomes less reliable due to the virtualization layer. So, instrumentation must be on the application level. Second, systems are not completely isolated from the performance point of view and they could impact each other. And, of course, we have mostly multi-user interactive workloads which are difficult to predict and manage. That means that such performance risk mitigation approaches as Application Performance Management (APM), performance testing, and capacity management become very important in cloud environments (although, of course, in a somewhat changed form).

AWS Well-Architected Framework, including Performance Efficiency, Cost Optimization, and Reliability pillars, was released in 2012 – a good indication that cloud performance engineering became mature enough to be documented.

As systems became more complex, the concept of observability was introduced (originally with three pillars – logs, metrics, and traces) and became popular. According to this article, it was first used in its today’s meaning by Twitter in 2013.

The FinOps Foundation was created in 2019 and got a lot of traction optimizing cloud costs mostly from the financial side. It was focused on costs, but costs in Cloud are very correlated with performance. While the approach was rather limited from the performance engineering point of view, it looks like the Foundation is making steps to extend beyond Cloud and Finance at the moment. Two most interesting publications in that direction were Cost-Aware Product Decisions and The Scope of FinOps Extends Beyond Public Cloud.

Artificial Intelligence

Artificial Intelligence (AI) has its own long and very interesting history (Wikipedia believes that it appeared as a separate discipline in 1956), but it is definitely out of the scope of this article. However, it started to change the mainstream computing paradigm only recently. Probably the release of ChatGPT in 2022 was the event when it became noticeable. As terminology is vague and quickly changing, it is difficult to define that new stage by a specific term. Currently it includes Generative AI, Large Language Models (LLMs), and Agentic AI – but it looks like terminology is not final and we will probably see new developments soon.

There is no doubt that AI is the next large thing changing the computing paradigm. But it is still in the beginning, quickly developing, so exact consequences are not clear yet. Its impact on performance engineering is even less clear at the moment.

One challenge AI brought is that performance and benchmarking in the AI context usually refer to correctness (response quality), not the classic performance (throughput and latency). The only way to separate what meaning these terms are used is from context.

But the classic performance is still extremely important for AI due to extreme scale and costs and some good information started to show up (like AWS Well-Architected Framework – Generative AI Lens or Chris Fregly’s AI Systems Performance Engineering book).

One important consequence is already clear – it is active usage of other types of computing units, such as Graphics Processing Units, GPUs. The Wikipedia article states that the term was coined by Sony in 1994 – but probably the main turning point was the CUDA platform, introduced by Nvidia in 2007, the earliest widely adopted programming model for GPU computing. It changed the paradigm – before it was all around CPUs (at least in mainstream), but now we have different types of specialized computing units introducing more options for performance engineering. While it may be different types of processing units (for example, Google introduced Tensor Processing Units in 2016), CPUs don’t hold monopoly anymore and other possible processing architectures need to be evaluated in many cases (which probably will become more widespread as development experience for other processing units will be simplified). Of course, it took a long time to work in High-Performance Computing (HPC) and scientific communities – but now it is moving into the mainstream.

What Lays Ahead?

While performance is the result of all design and implementation details, the performance engineering area remains very siloed. Maybe due to historic reasons, maybe due to the huge scope of expertise. People and organizations trying to span all performance-related activities together are rather few and far apart. Attempts to span different silos (for example, DevOps) often leave many important performance engineering areas out.

The main lesson of history is that the feeling that we are close to solving performance problems exists for the last 50+ years. Probably will stay with us for a while – so, instead of hoping for a silver bullet, it is better to understand different existing approaches to mitigating performance risks and find an optimal combination of them to address performance risks in your particular context.

Chronology

Chronology

Mainframes

Distributed Systems

Web / Mobile

Cloud

Artificial Intelligence

What Lays Ahead?

Similar Posts