Using AI Code Assistants to Generate Unit Tests and Maximize Coverage

October 06, 2025 by Otso Virtanen | Comments

Code coverage measures which parts of the source code have been executed by tests. The metric used can be functions, lines or branches executed, or more complex metrics such as assessing independent conditions in statements. Depending on your code coverage tooling, the tests can be a combination of unit tests, functional tests, or even manual tests. High code coverage is desirable as it often indicates well-tested software. However, writing tests cases to achieve high code coverage can be laborious and expensive.

In this article we demonstrate with an example how to use AI code assistants and code coverage tools together to generate unit tests to increase code coverage. We also…

October 06, 2025 by Otso Virtanen | Comments

In this article we demonstrate with an example how to use AI code assistants and code coverage tools together to generate unit tests to increase code coverage. We also provide the AI code assistant with the means to verify that the coverage metric has increased by executing the newly generated unit tests and analyzing the code coverage report.

The example in this article uses Microsoft’s GitHub Copilot in Visual Studio Code with Coco, a code coverage solution for C/C++, C#, Tcl and Qt’s QML. This setup is generic and can be adapted for other code coverage tools and AI code assistants, such as Cursor and Claude Code. Similarly, while our example generates CppUnit tests, other unit test frameworks like GoogleTest or Catch2 would also work.

The Setup

We are building on an existing Coco example documented here. This is an example of a simple calculator using an expression parser including baseline unit tests. The example includes detailed steps on how to use Coco for various topics, from basic code coverage to more advanced topics like patch analysis. Install Coco and the parser example source code from here and see the developer documentation for more information.

New to code coverage? In a nutshell, you’ll be using a Coco-instrumented build, which acts like a bean counter: it keeps track of the lines executed and matches those lines with your test cases. Other than that, the functionality of the application doesn’t change, and you can use the same instrumented build for all your tests—including manual and functional tests. That’s it!

We start with theparser_v4 version with a working setup of Copilot, Coco, and the ability to compile and run the parser example and its accompanying unit tests.

The image below shows the setup we have in place for improving the unit tests:

Coco blog

The components and their descriptions are as follows (see the repository for usage instructions):

Context for the AI code assistant including how to ( “Coco setup” in the above image):
Build an instrumented Coco target;
Run the unit tests for the instrumented build and
Process Coco code coverage results.
A prompt that links to Step #1 and outputs tasks for improving the unit tests (“Coco guidelines” and “Tasks”).
A plan to generate unit tests based on Step #2, including a step to verify that the code coverage increased with the selected metric. For this example, we use line coverage. Note that the generated unit tests and improvements need to be always reviewed by the end of the process.

##Demo: Generating Unit Tests

With the setup described we are ready to try it out. Here’s a recording of a session with the parser example in Visual Studio with Microsoft GitHub Copilot in Agent mode using Gemini 2.5 Pro:

From this demo you will see how:

Initially, the line coverage was around 65% (see the end of the recording).

After reviewing the initial coverage, the instruction was to focus on the mathematical functions, which had not been tested at all and had the potential to significantly increase the project’s overall code coverage.

Generating and running tests for the mathematical functions improved the project’s line coverage to 78%. The prompts, the configuration guidance for the Coco setup, and the generated improvements to the unit tests can be found in this repository.

Conclusions and Next Steps

In this blog post we used code coverage with an AI code assistant to understand which areas in a codebase would benefit from new unit tests to achieve higher coverage. Using code coverage verifies that the generated tests are meaningful and that the selected coverage metric improves.

The same setup would be particularly valuable if you are working in safety-certified areas and are aiming for thorough testing coverage. For example, in automotive or aerospace, the target can be 100% with the modified condition/decision coverage (MC/DC) metric. An option might be to use this with Coco’s built-in functionality to discover test data with genetic algorithms.

It’s also possible to mix other metrics with code coverage to identify the areas needing new unit tests. In work described here, one of our solution engineers used the “CRAP score”, a combination of McCabe’s cyclomatic complexity metric and code coverage, to identify riskier areas that would benefit from increased testing.

We did not leverage the Model Context Protocol (MCP) but rather relied on direct CLI invocations of Coco tools. An MCP server could be the solution for understanding the coverage reports in detail and working with longer reports for larger code bases rather than relying on the model to parse a text-based CSV report directly. The same applies to using the setup in continuous integration (CI).

Need More Information?

For Coco, see ourproduct page orproduct documentation for more information. For Qt developers, seethe QML example andthe Qt instructions.

This article offers a good overview of generating unit tests with examples of using Microsoft’s GitHub Copilot – the article also briefly mentions the usage of code coverage tools for unit test generation.

Please contact the author Otso Virtanen at @qt.io - Otso is the product lead for the Generative AI/AI initiatives for Qt Group’s Quality Assurance tools.

The Setup

Initially, the line coverage was around 65% (see the end of the recording).

After reviewing the initial coverage, the instruction was to focus on the mathematical functions, which had not been tested at all and had the potential to significantly increase the project’s overall code coverage.

Conclusions and Next Steps

Need More Information?

Blog Topics:

Similar Posts