Bring Confidence to Complex Data Migrations with Snapshot Testing

Data migrations can be deceptively tricky. On the surface, they sounds simple enough. Move data from one place to another, maybe changing a few fields along the way. In reality, migrations often sit at the intersection of legacy quirks, evolving business logic, and messy real-world data. All of the “well that’s just how we did it”-isms in the old system can become major challenges for a team of developers trying to fit the data into the new system. Especially when even small oversights can ripple into major issues once the new system goes live.

My team has been working on a large and complex data migration for a long-lived system. As part of this work, we need a way to verify that our migration logic is doing exactly wh…

My team has been working on a large and complex data migration for a long-lived system. As part of this work, we need a way to verify that our migration logic is doing exactly what we expect — and to stay confident as we make inevitable changes along the way. At some point early on in this work, we realized that manual spot checks and fabricated test setup weren’t going to cut it.

That’s where snapshot testing came in. By capturing snapshots of real data, running them through our migration, and comparing the output against verified baselines, we built a safety net that let us move fast without sacrificing confidence. What started as a small experiment quickly became one of the most valuable tools in our workflow.

What Is Snapshot Testing?

At its core, snapshot testing is a way to capture the output of your code at a specific point in time and use it as a reference for future comparisons. Think of it like taking a photograph of your program’s output — if something changes later, you’ll see the difference immediately.

Snapshot testing is most often associated with front-end development. Tools like Jest, for example, can take a “snapshot” of a rendered React component and alert you when its structure or output changes. But the concept is much broader than that. Any time your code transforms input into output, a snapshot test can help you verify that transformation stays consistent over time. In our case, that transformation is a data migration. We take real input data from the old system, run it through our migration process, and generate new data in the shape of the new system. Once we’ve manually verified that the output is correct, we save it as a “verified” snapshot. Later, when we update the migration logic, those snapshots serve as a guardrail — a quick way to see exactly what changed and whether it’s expected.

Snapshot testing gives us visibility and confidence. Instead of relying on intuition or ad hoc validation scripts, we can clearly see the difference between “before” and “after.” And when a change does alter the output, we can decide intentionally whether it’s a bug or an improvement.

How We Use Snapshot Testing for Migration Validation

Once we decided to incorporate snapshot testing, we started with just a few representative examples of input data and their expected outputs. That small start quickly grew into a powerful safety net we now rely on daily. For most of our migration code, we use the Verify library for C#. Verify makes it incredibly easy to capture complex objects, serialize them to JSON, and compare them against a previously approved snapshot. In practice, this means that many of our integration tests end with a line like:

await Verify(migratedData);

When the test runs for the first time, Verify creates a .verified.json file containing the output. After we’ve manually inspected and approved it, that file becomes our baseline. On future runs, Verify automatically compares the new output against the baseline and highlights any differences. If the migration logic changes intentionally, we review and re-approve the updated snapshot. If something changes unexpectedly, the test fails. This clues us in to where we went wrong in our migraiton logic and prevents regressions.

This approach has been invaluable for validating the core migration logic, but the system we are migrating to also generates external reports as plain text files. These reports represent a critical piece of functionality that falls outside our C# codebase. To verify those, we created a simple, hand-rolled Python script that reads the .txt outputs, normalizes their format, and compares them line-by-line against stored baseline versions.

It’s not fancy but it gives us the same benefits: quick, reliable validation and clear visibility into any changes. Together, these two tools (Verify for JSON outputs and our Python diff tool for reports) have been crucial to our success. They allow us to work confidently on a complex, evolving migration without constantly worrying about regressions or unexpected side effects.

How You Can Adopt Snapshot Testing in Your Own Projects

Snapshot testing isn’t limited to one language or framework. Whether you’re working in Python, C#, JavaScript, or something else entirely, the idea is the same: capture the output, store it as a verified baseline, and automatically detect changes over time.

If you’re using C#, Verify is a fantastic place to start. It supports a wide variety of data formats, integrates smoothly with popular testing frameworks like xUnit and NUnit, and even provides useful diff tools when snapshots change.

For other ecosystems, there are well-established libraries with similar capabilities:

JavaScript / TypeScript: Jest has built-in snapshot testing that works great for serialized data as well as UI components.
Python: Tools like Syrupy or ApprovalTests.Python provide flexible options for capturing and approving snapshots.
Java: ApprovalTests.Java offers an elegant, language-idiomatic approach to the same workflow.

And if none of these quite fit your needs, you can always take the simple route we used for verifying text-based reports — a hand-rolled diff tool. Sometimes, the most effective solution is just comparing the old and new versions of a file and flagging any unexpected differences. The key isn’t the specific tool; it’s the mindset of using snapshots to create a fast feedback loop for correctness.

Lessons Learned and Best Practices

After using snapshot testing throughout this migration, a few key lessons have stood out:

Use real data. Synthetic examples are easy to write, but real-world data exposes the quirks and edge cases that matter most. If you can safely anonymize it, do it. This is especially useful when you are able to anonymize and snapshot the data relating to a reported bug or issue.

Review intentionally. When a snapshot changes, don’t just click “approve.” Take a minute to understand *why* it changed. In the moment, this can feel like extra work, but this small pause has saved us from subtle regressions more than once.

Build what you need. Our simple Python diff script showed us that don’t always need an external library to get the benefits of snapshot testing.

Snapshot testing has become one of the most valuable tools in our data migration project. It gives us confidence that our data transformations are correct, clarity when they change, and the freedom to keep improving our migration without fear of breaking something unseen. At its best, snapshot testing doesn’t just catch mistakes. It helps you understand your code more clearly and make changes with trust, not guesswork.

What Is Snapshot Testing?

How We Use Snapshot Testing for Migration Validation

How You Can Adopt Snapshot Testing in Your Own Projects

Lessons Learned and Best Practices

Similar Posts