Software Testing Fundamentals: Black Box, White Box, and the Test Pyramid

Software testing plays a fundamental role in ensuring that systems behave as expected and meet both user and business requirements. While tools and frameworks evolve quickly, the core testing principles remain largely the same.

The fundamental objective of software testing is to verify whether the system behaves according to previously defined expectations (Bourque & Fairley, Guide to the Software Engineering Body of Knowledge, 2014).

This article is adapted from Chapter 4 – Tests in Microservices of my Master’s thesis, where I analyze software testing strategies and how they scale from traditional systems to modern architectures.

In this post, I’ll focus on the fundamentals of software testing and the Test Pyramid, which serves as the foundation for understanding mo…

In this post, I’ll focus on the fundamentals of software testing and the Test Pyramid, which serves as the foundation for understanding more advanced testing approaches.

What is the Goal of Software Testing?

At its core, software testing aims to ensure that a system conforms to expected behavior and defined requirements.

Software testing seeks to validate whether a system is in accordance with functional and non-functional requirements previously established.

Beyond simply finding bugs, testing helps reduce risks in production environments and increases confidence when evolving software systems.

Software Testing Techniques

Software testing techniques can be broadly classified into three categories (Mathur, Foundations of Software Testing, 2013):

Black-box testing
White-box testing
Gray-box testing

Each of these techniques plays a complementary role in achieving adequate test coverage.

Black-box Testing

Black-box testing evaluates the functionality of a system without considering knowledge of its internal structure. The tester focuses exclusively on inputs and outputs, without concern for implementation details or internal logic.

In black-box testing, the tester concentrates on analyzing system inputs and outputs, without worrying about the underlying logic or component implementation.

This technique is effective in validating user requirements and detecting failures that may not be evident through internal code analysis.

Advantages:

No need for internal code knowledge
Faster execution and lower cost
Good alignment with user expectations

Limitations:

Cannot detect internal logic or implementation errors
Limited visibility into hidden edge cases

For this reason, black-box testing should always be combined with other testing approaches.

White-box Testing

White-box testing is based on the analysis of the internal structure of the system, requiring detailed knowledge of the source code and component logic.

White-box testing evaluates not only system functionality, but also efficiency, security, and maintainability.

This technique enables the identification of:

Logical flaws
Syntax errors
Execution path issues
Security vulnerabilities

It also supports code optimization, improving performance and scalability.

Challenges:

Higher complexity
Greater time and effort required
Requires strong technical expertise

White-box testing is especially valuable at lower levels, such as unit and component testing.

Gray-box Testing

Gray-box testing combines characteristics of black-box and white-box testing. The tester has limited access to the system’s internal structure, allowing partial analysis of the code and logic.

“Gray-box testing enables the detection of failures that would not be evident through purely external analysis, without the complexity associated with full white-box testing.”

— Mathur, Foundations of Software Testing

This approach offers a good balance between coverage and cost, although limited internal access can still restrict test precision.

Functional vs Non-functional Testing

In addition to testing techniques, defining testing objectives is a fundamental part of the software testing process.

According to Bourque & Fairley, in the Guide to the Software Engineering Body of Knowledge (2014), identifying testing objectives is a fundamental aspect of the software testing process.

These objectives are commonly divided into two categories.

Functional Tests

Functional tests aim to validate system functionalities and verify whether they comply with established requirements. This category often includes end-to-end testing, which validates integration and interaction between different system components.

Non-functional Tests

Non-functional tests focus on system qualities not directly related to functionality, such as:

Performance
Scalability
Reliability
Stress tolerance

Performance and stress tests evaluate system capacity, response time, and behavior under high load and overload conditions (Sotomayor et al., Comparison of runtime testing tools for microservices, 2019).

A clear definition of these objectives helps ensure adequate coverage and minimizes the risk of failures in production environments.

The Test Pyramid

One of the most widely adopted models for organizing software tests is the Test Pyramid, originally proposed by Mike Cohn.

“The ‘Test Pyramid’ is a metaphor that tells us to group software tests into buckets of different granularity. It also gives an idea of how many tests we should have in each of these groups.”

— Martin Fowler, The Practical Test Pyramid

The pyramid structures tests according to cost, speed, and scope.

Image Source: Martin Fowler, The Practical Test Pyramid

Unit Tests (Base of the Pyramid)

Unit tests focus on small, isolated parts of the system.

According to Martin Fowler in The Practical Test Pyramid, unit tests are expected to be significantly faster than other test categories, and each unit should be tested independently

They are usually written by developers and should represent the majority of the test suite.

Service / Integration Tests (Middle Layer)

Service tests validate integration between components, evaluating services in isolation from the user interface.

Service-level tests fill the gap between unit tests and end-to-end tests. They test the integration of your application with all the parts that live outside of your application (Martin Fowler, The Practical Test Pyramid).

These tests ensure that components collaborate correctly.

End-to-End Tests (Top of the Pyramid)

End-to-end tests validate complete user scenarios by testing the system as a whole.

End-to-end tests aim to simulate interactions with the system in the same way the end user would.

Because they are slower and more fragile, they should be used sparingly.

Recommended Proportions

A commonly recommended distribution is:

More than 60% unit tests
Less than 30% service or integration tests
Less than 10% end-to-end tests

Following this structure helps ensure fast and stable tests, while reserving a small number of tests for validating complete user scenarios.

Why the Test Pyramid Still Matters

Although simple, the Test Pyramid remains an effective guideline for building efficient and scalable test suites.

By combining different types of tests and adjusting their quantity at each level, it is possible to obtain comprehensive and efficient test coverage.

References

Bourque, P., & Fairley, R. (2014). Guide to the Software Engineering Body of Knowledge. IEEE.
Mathur, A. (2013). Foundations of Software Testing. Pearson.
Cohn, M. (2010). Succeeding with Agile: Software Development Using Scrum. Addison-Wesley.
Fowler, M. (2014). The Practical Test Pyramid. martinfowler.com
Sotomayor, B. et al. (2019). Comparison of runtime testing tools for microservices.