Battle Scars from the Cloud Front

The Promise

It is no secret that Cloud platforms have been adopted by most organisations for running their infrastructure. Virtualization of infrastructure brings many advantages.

In the early 2000’s I had to pay for the hardware and have it physically installed in a data center. You had to pay for a lease to host it. This was expensive and involved.

With Cloud based Virtual Machines we could spin up a machine at a moments notice, perform some work and then tear it down, paying only for the time it was up. Then along comes Docker and containerization, which reduces the footprint for an instance and makes it possible to easily scale based on a image.

Then comes Kubernetes to help manage those containers and configure the networking and create internal networks to interconnect …

The Promise

It is no secret that Cloud platforms have been adopted by most organisations for running their infrastructure. Virtualization of infrastructure brings many advantages.

In the early 2000’s I had to pay for the hardware and have it physically installed in a data center. You had to pay for a lease to host it. This was expensive and involved.

Then comes Kubernetes to help manage those containers and configure the networking and create internal networks to interconnect your micro-services. Finally there are Lambdas and other net functions which eliminate even the need to worry about infrastructure at all. Just drop a function into AWS and connect it up to one of the services. Pay only for CPU you use.

There has been an evolution from hosting physical machines to Cloud platforms where you no longer even see the underlying machines but rather a monolithic platform service. The service handles the management of the infrastructure for you, freeing you to focus on the code. This is the promise of Cloud, and frankly it has delivered. So what is my problem with it?

With Clouds comes Sink

As any good glider pilot can tell you with any cloud comes sink, the smooth descending air which pulls you inexorably toward the earth.

Having worked in a number of environments now that utilize the Cloud platforms I think there are some pitfalls we need to discuss.

Inability to Run Locally

Before you reach for your keyboards, I know there are ways to run code locally. This point is about how Cloud development has encouraged complex configurations, resources and permissions which create barriers to being able to run your code locally. By ‘locally’ I include not being able to set up realistic sandpits on your Cloud platform.

In my experience developers have been expected to complete tickets and create a Pull Requests to bring code into a development branch from a feature branch on the basis of only passing unit tests. They may not have had access to databases to test their SQL against test data. They may not have had the opportunity to send message payloads to other services to validate they integrate properly.

Is this just me? Is it that Peter can’t hack change? Well, from what I have seen many of the developers around me have been struggling with this problem. In the bad old days you would simply run everything on your local machine. But today there are platform services, permissions, and configurations which are more than environment. Platform configuration has become part of the application proper, meaning you can’t run it locally.

Yes, I know Kubernetes can be run on your local box,only in my experience you can’t just shift an application from your local system to the Cloud. Because configuration is now a explicit part of the app the infrastructure is more than just the substrate you run your app on. Similarly it is possible to run Lambdas locally, but there are limitations, and there really isn’t a like for like substitute for running on the platform.

No Deploy on Commit

Related to the above point is that unit tests would not run when a feature branch build received a commit. Prior to this my experience was that every single commit would result in a build and unit tests run. If there were unit test failures you would get a report.

In more recent environments the Feature Branch approach was set up only to build on the development and master branches commits. Typically this would be when a PR was merged into development.

Of course, we should be building and running unit tests prior to commit. However, the lack of deployment means we don’t get to see how a Lambda actually acts until it makes it into the development branch and is deployed into a development environment.

The ‘development’ environment in this context is a single environment for everyone, not the same thing as a local environment which developers can work in without impacting others.

Shifting System Complexity to the Platform

This has some overlap with micro-services which have been all the rage in cloud environments. Software used to be more monolithic, all the code being in one binary. There would be separation of concerns, in that the Database might be separate from the Web Server, but they would each have distinct capabilities.

The difference is that micro-services have been aligned with the domain, and so we now get a solution composed not of one or two major components, but dozens of interlocking small services playing a part in a complex web of dependencies.

In one recent example there was a system that involved sixteen separate repositories, each a separate deployable unit of Lambda functions. Some functions would directly call other Lambdas from other projects in code.

Now, some might say this is clearly the wrong thing to do, and they would be 100% correct, but the deeper question is how we got there. We have essentially broken up what would have been separate packages each with their own purpose but running in the same machine into separate services with complex dependencies on other services.

This has also resulted in complex deployment configurations, with the plumbing for all these service connections defined in the platform configuration rather than inside the application software.

Too many projects

Something else I have seen is that software is decomposed into micro-services which are then handed to separate teams. Each team might handle each service in their own way. This is partly a result of new services being developed over time and being bolted on.

As a result I have seen environments where projects are no longer maintained, where the developers have left, and there has been no continuity. As a result the new developers coming in are left with a menagerie of repositories, each dissimilar, written with different technologies.

The CI/CD pipelines may not even be working, and the configurations might be so old they no longer function. Or worse, there is no real definition of what makes up the system such that critical Lambdas are not really visible until you need to change it or it breaks.

Poor Service Dependency Visibility

Related to the number of services is the fact it can be very difficult to actually visualize how everything is connected. The direct call of a Lambda in one project to a Lambda in another was an example of a hidden dependency. There was no contract or exposed service on the receiving service, just a Lambda that could be called. The clients of this service performed direct calls to the Lambda.

In monolithic apps the dependencies can be followed directly in the IDE. You can follow call chains from class to class easily. At debug time you can even follow the execution pointer around. But when debugging issues in systems that involve many separate micro-services we now face a challenge of finding out exactly where the problem occurs.

Finding out how all the services depend on one another can be a real challenge, as there isn’t one canonical place that this can be visualized. As discussed earlier it is even worse when you don’t have debug access to a running system. You can’t step through the code base to find an issue in a Lambda in a test system.

Summary

We like to think progress goes only in one direction, that the old was worse than the new. With Cloud platforms there are undeniable benefits, but not all is a box of butterflies. I’ve seen some real challenges that are at least in part a consequence of the impact of Cloud on developers workflow.

I want to be able to run and step debug a problem. I want an application to have a single well structured and maintained code base. I don’t want many separate small services with ill defined interfaces running different technologies.

The Promise

The Promise

With Clouds comes Sink

Inability to Run Locally

No Deploy on Commit

Shifting System Complexity to the Platform

Too many projects

Poor Service Dependency Visibility

Summary

Similar Posts