Ways of incrementing
I work mostly with full stack development, though I don’t do that much frontend work. But I will try to provide the full stack for changes as best I can. Most of these real-world examples are either from the project that was surveyed or past projects I have been on.
In each case I will try to provide full step-by-step descriptions. Highlighting for each increment what value it brings, how risk is reduced.
The core assumptions that underlines these articles are that often we are not able to fully assure quality of changes before they hit production, so we want to do the changes in a safe way. Additiona…
Ways of incrementing
I work mostly with full stack development, though I don’t do that much frontend work. But I will try to provide the full stack for changes as best I can. Most of these real-world examples are either from the project that was surveyed or past projects I have been on.
In each case I will try to provide full step-by-step descriptions. Highlighting for each increment what value it brings, how risk is reduced.
The core assumptions that underlines these articles are that often we are not able to fully assure quality of changes before they hit production, so we want to do the changes in a safe way. Additionally it might require an exorbitant amount of work to thoroughly QA changes locally, via unit tests or in a test environment. Even if we did spend this effort, it would still not give certainty. So instead of spending that effort, we opt for leveraging feature toggles and modular design to validate changes in production safely and conclusively.
With that said, changes are tested and validated locally, possibly validated in a test environment as well. But for the sake of illustrating the incremental approach I will not go into too much detail about that aspect in this article. The point is that we often are not able to prove correctness at the source, so we want to build and deliver safely and responsibly. There is a quote that conveys this point succinctly, often attributed to Mark Twain, but never said by him (which is kind of ironic):
"It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so" - Some dude
We had a lot of data and external dependencies that would have required a lot of work to make sure were aligned between environments. Additionally the difference in strain on the system was influenced by usage, background work and a lot of other moving parts. Thus a "Live Fire Exercise" approach made most sense.
With all the preamble out of the way, let’s dive into the nitty gritty.
Performance issue in SQL database solved by converting NVarchar column to a compressed Varbinary column
Sometimes it is nice to have text data near to your rows in a database as an NVarchar column. E.g. some JSON data, compact version of big lists, metadata or something else that does not need to be indexable or searchable, but efficiently available when you retrieve your items.
The challenge is that performance can degrade in certain cases and certain sizes of NVarchar columns. The underlying issue has to do with the max size of rows, how NVarchar(max) is streamed to client, I/O pressure on database server, etc. SQL databases are complicated. But a possible "easy" fix is to store the data as a zipped Varbinary column and unzip in service memory if needed. In our case we had this issue three different places in our database model, and the size of the data when zipped was reduced by 95% on average.
As databases are complex beasts and unknown-unknowns can end up biting you in the column change, it is good practice to introduce the new column first, clone data to it and wait with the deletion of the old column. This way it is possible to verify that the new column is running stable without any unforeseen hiccups or side effects.
Before fixing the issue in this case, we had isolated the performance problem we saw (by focused performance logging). We expanded on this performance logging to make sure we had a "paper-trail" of the improvements as changes were made.
An overview of how the database schema changes
Step 0: PoC
In a branch we implemented a PoC of adding a new Varbinary column, running database scripts, getting writes and reads for the column to work and making sure all the individual parts of the solution actually worked as expected. This actually turned out to have a few gotchas related to encoding in the database and respecting it correctly in the application.
We also made sure that it was still possible to do the zipping/unzipping directly through SQL queries using the Compress/Uncompress functionality in T-SQL.
Step 1: Verify performance aspects
With the PoC we tested what indices could be created to make the new Varbinary column perform well. We created a new index based on the existing one. We added an Include of the Varbinary column as part of the "stored" data in the index.
Step 2: Validate conversion of production data
We ran a script doing the conversion of data in production (just as a select) to verify that the constraint on the size of the Varbinary column was not too small.
With most of the risks mitigated we could begin actually implementing the solution.
Step 3: Migration script
We introduced a migration script to our solution, which was run in our deploy pipeline. It only introduced the column, we did not put any data into the column at this step.
Step 4: Update ORM (Object Relational Mapping)
We updated the Entity Framework mapping with the new column, so it was available for use.
This allowed us to validate safely that we encountered no "unknown-unknown" issue with Entity Framework and Varbinary.
Making changes to database and ORM are notoriously difficult to feature toggle your way around, which was one of the reasons we did it in these baby steps. We ended up introducing the feature toggle described in the next step.
Step 5: Add feature toggle to enable switching use of Varbinary column
I usually use feature toggles in app-settings file, but I have also used hardcoded values, that are dependent on environment, and previously also used Launch Darkly and a table in database. I don’t think that the tech you use for feature toggles are that important or interesting. The important thing is the modular design feature toggles nudges you towards. And the safe baby steps and validation they allow.
The reason I mention this, is that the hard part is not the feature toggle technology, but designing and writing the code that uses them. Changing what database column to use by ORM can be tricky, but fortunately we found a simple solution.
All the existing code was already using the MetaInfo property. So we made that property a non-mapped proxy (or control flow router), that could, based on a feature toggle, map to the NVarchar column or Varbinary column in the database.
#define ENABLEVARBINARY
public class ClassWithMetaInfo
{
public int Id { get; set; }
[NotMapped]
public string MetaInfo
{
get
{
ValidateBinary();
return MetaInfoString;
}
set
{
MetaInfoString = value;
MetaInfoBinary = Compress(value);
ValidateBinary();
}
}
[Column("MetaInfo")]
public string MetaInfoString { get; set; }
[Column("MetaInfoBinary", TypeName = "varbinary(max)")]
public byte[] MetaInfoBinary { get; set; }
void ValidateBinary()
{
#if ENABLEVARBINARY
var asString = MetaInfoString;
var fromBinary = Uncompress(MetaInfoBinary);
//Compare Unzipped version of
//varbinary with MetaInfoString
//and inspect performance.
#ENABLEVARBINARY
}
}
This approach also laid out the ground work for doing the switch later on, but initially we wanted to make sure that we were not overlooking some unknown-unknown issues related to the Compress and Uncompress. For example related to encoding or the memory streams used for it. We also wanted to get a feel for the impact on memory footprint or performance of the service.
We feature flagged the mapping of the Varbinary column as well as the validation with a Compilation symbol #define ENABLEVARBINARY, so we quickly and easily could disable the change if we saw issues. This was also done in the mapping setup of Entity Framework.
Step 6: Move feature toggle
After having Varbinary validated live in production and no having seen no issues, we moved the feature toggles into our normal place. After running a script manually in test, we were able to use the varbinary column in the test environment.
#define ENABLEVARBINARY
public class ClassWithMetaInfo
{
public int Id { get; set; }
[NotMapped]
public string MetaInfo
{
get
{
ValidateBinary();
if (FeatureToggles.UseMetaInfoBinary)
return Uncompress(MetaInfoBinary);
return MetaInfoString;
}
set
{
MetaInfoString = value;
MetaInfoBinary = Compress(value);
ValidateBinary();
}
}
[Column("MetaInfo")]
public string MetaInfoString { get; set; }
[Column("MetaInfoBinary", TypeName = "varbinary(max)")]
public byte[] MetaInfoBinary { get; set; }
}
At this point we began writing to the column. This allowed us to validate that writing from the application and reading again did not result in any hiccups or surprises. The code would basically be "dark launched". The old NVarchar value was compared with the Varbinary value, and an error logged if they were different. They should always be the same, because we built it right. But I have been surprised by less intimidating problems before...
Step 7: Add indices via migrations
We then introduced the identified indices to improve performance and have measurements for this improvement. That is another point. Make sure you have logging that allows you to compare the performance at each relevant step of this process. And make sure when you introduce an index, whether your migration risks locking the database while it is getting created. We avoided the tables locking up, because we had seen it when rolling out to test.
Make sure that you have logging that allows you to monitor the performance at each step of this process.
Step 8: Inspect and monitor
We could now see the actual performance effect from the change by comparing our previous metrics with our new metrics. It had improved by a factor 10. That was satisfactory, so we were now done fixing the performance issue.
Step 9: Cleanup
After having let it run for some time, we removed the MetaInfoString property, the corresponding NVarchar column and the feature toggle.
Conclusion
I know typing out this whole example verbatim might make it seem like a roundabout and cumbersome way to implement a "small" database change. But by using modularization you can often do these types of stepwise changes cleanly and, importantly, keep safe all the way through.
Without otherwise using the term TDD, the steps basically emulate the Red → Green → Refactor, not via unit test, but deploys to production. As with TDD, the steps need to be small. And that means both small changes, but also very frequent deployments.
Alternative approaches
An alternative approach would have been to do the classic branch with all steps compacted into one merge. Assure quality by setting up a test that would have a high probability of of actually catching most, if not all, issues that could arise in production. But we would not be certain that we would not have some unknown-unknowns that would cause problems, so doing it in one step would increase risk to a degree that we did not feel comfortable with.
We were in the middle of a huge and important Go-live, and even though the performance issue was a problem, we still had a working application. So we did not want to jeopardize that. We did also encounter some issues along the way, that would have compounded into likely production incidents, which we avoided with this approach. We learned along the way. And did not spend a lot of extra time, setting up a test environment with all the dependencies, data and load, so it would be meaningful. If we had done that, we would still not feel completely confident that we had covered everything. So instead we chose this approach not just because it was less work, but because we could actually trust the conclusions we made along the way.
Benefits
Each of the steps above represents at least one deploy each. Step 9 might have had 20 or more deployments to prod before it finished. At no point did we take on unnecessary risk. And even though it did not happen this time, often it turns out that the modular designs you build towards, to make a clean toggling or branch by abstraction, turns out to improve the code. So building in small steps and doing changes toggleable, nudges you and your team towards more maintainable (and better) code.
Deployments can happen between all steps above, with no problems. And that is the point. We are always on firm ground. Because we don’t pile up risk. We do this by committing to main, 1 commit at a time. We commit simple changes, refactors or additions that are trivial to validate and trivial to review. Often because most commits are so small or protected behind feature toggles, that they seem trivial and not much value in doing the reviews.
Though we did not encounter many problems in this example, I have very rarely regretted using this feature toggled step approach.
I have regretted trying to "optimize" taking bigger leaps up the steps.
If you want more rationale and examples, you can find them here