What is a distribution? Top 5 functions of a distro:

11 min readJust now

–

Summary

A distribution is a project that distributes software. The top 5 functions of a distribution, in my opinion, are: Distributions reduce the complexity of the Free Software ecosystem. Distributions handle non-technical tasks, such as legal compliance and licensing issues. Distributions build software from source, in order to validate core Software Freedoms, such as the ability to modify software. Distributions provide a space and tooling for collaboration. Distributions should minimize the difference between what they collect from upstream projects and what they distribute to end-users.

A distribution is a project that distributes software.

In order to understand what distributions do and why, it’s important to understand what a distribution *…

11 min readJust now

–

Summary

A distribution is a project that distributes software.

In order to understand what distributions do and why, it’s important to understand what a distribution is. So, let’s start with a simple definition of “distribution.”

A distribution is a project that centralizes the process of building usable systems out of the body of publicly available software, which is generally in source code form.

Distributions provide software to their users, and I think a lot of people tend to believe that the software is the distribution. I think that’s wrong, or at the very least it’s a superficial view of the distribution. If the distribution is the software, and most distributions are providing builds of exactly the same source code to their users, then what is the difference between one distribution and another? Once you see that the distribution is really the people, you start to see the differences between distributions more clearly.

Distributions reduce the complexity of the Free Software ecosystem.

So distributions collect publicly available source code and build software. That sounds simple. In fact, one wonders why developers can’t build their own software.

So why do we even have distributions in the first place?

Of all of the purposes that distributions serve, this one requires the most complex explanation. In fact, I’m going to give this section its own summary:

Summary

Effectively all modern software is built with reusable, shared components. For each shared component, there will be both a minimum and a maximum version that a developer expects to be compatible with their application. The stable release model allows developers to work asynchronously. Distributions simplify the branches of components with diverse lifecycles. Building a stable distribution out of components with diverse lifecycles requires compromises.

The minimum version and the maximum

To make any sense of release cycle complexity, I think you need to understand Semantic Versions.

SemVer provides a simple taxonomy of changes to software. Any given change can be classified by its effect on compatibility with other software. A change can break compatibility, as it might when a feature or interface is removed. A change can add features without breaking compatibility. Or a change can not modify interfaces or impact compatibility at all, as in a bug fix.

It’s important to note that while the semantics of changes can be communicated through appropriate versioning, the change semantics and the versioning are conceptually separate. That is, the taxonomy of changes is valid whether or not a software component uses the semantic version system.

For each reusable component that an application uses, there is a set of features that are used. Each of those features first appeared in some release in the past. The first version of the shared component that provides all of the features that the application needs is the minimum version required by the application. It is unlikely that any set of interfaces will be supported forever, and the next version that breaks compatibility with any of the interfaces that an application uses is the upper boundary of the versions required by the application.

Breaking changes, in particular, require a process to support them. It’s good for sustainability that developers are allowed to remove features that are unsafe, or that limit the design of their software. It’s good that developers are able to expand systems beyond the limitations of the architecture of early implementations. But without supporting processes, breaking changes would require coordinated deployment of not just the new shared component, but of every system that interfaces with it. Synchronous coordination is difficult even in small groups. It is infeasible in complex systems like the Free Software ecosystem.

That brings us to the stable release model, which builds on the concept of Semantic Versions to allow developers to collaborate asynchronously.

The stable release model allows developers to work asynchronously.

The answer is to publish software not as one release stream, but as multiple streams. This way, when developers start a new release stream, they can continue to publish bug and security fixes to users who have systems built around the previous release. That allows developers to develop systems asynchronously. The developers of a software component can work on a new design, while continuing to support users and developers of the old design, and while the related systems are ported to the new design.

Supporting asynchronous work is critical. This allows everyone to continue work at all times, never requiring them to stop and wait for other groups.

As developers publish multiple software streams, each user / consumer of that software follows their own track through the available streams.

The diagram below illustrates a hypothetical stable release cadence.

Distributions simplify the branches of components with diverse lifecycles.

Let’s look at some realistic examples of shared component lifecycles, and how application developers plot a path through those release streams.

OpenSSL is a widely used project, which significantly improved its lifecycle management starting with version 3. OpenSSL provides a clear release strategy which describes how long each release series will be supported, how often new release series are expected, and what level of compatibility is expected across each release series.

Press enter or click to view image in full size

This information conveys to application developers what release they should target with their development in order to get the features that they need, while ensuring that the shared component will be supported and secure for the application’s own lifecycle. Using this information, they might actually select an older release series, with fewer features, in order to ensure that their application remains secure for the lifecycle they’ve promised to their own customers.

Unfortunately, many libraries in the Free Software ecosystem do not publish this sort of information, which is one of the primary challenges to developing applications on Free Software platforms.

Even when information about lifecycles is available, it remains challenging that release cycles tend to be relatively short in the Free Software ecosystem, and they tend not to align with each other.

An application developer that wants to use OpenSSL, GTK, and GStreamer, for example, might need to test their compatibility with a new release series frequently, and at a different cadence for each shared component that they use.

In some cases, this is simpler for components like QT, which aim to provide a complete platform for application developers. QT’s Community Edition is a rolling release (that is, there is only one release stream for QT 6), so application developers should expect a simple linear path through QT 6 releases.

However, what we often see in practice is that application developers test compatibility with the shared components, and update those components, on a schedule that doesn’t align with the upstream releases. In other words, application developers work asynchronously, and that can lead to applications using components that are unsupported and potentially insecure.

Distributions build collections of components, managing each of them according to their risk criteria and release cadence in order to provide something that more closely resembles a coherent platform for developers to target.

Building a stable distribution out of components with diverse lifecycles requires compromises.

You may notice that the diagrams of the components in a distribution sometimes include a component that is no longer maintained in the upstream project, or components that aren’t a single version throughout the release.

When a component’s maintenance window is shorter than a distribution maintenance window, one of three things must happen:

The distribution ships unmaintained code, and potentially puts user security at risk.
The distribution engages in software maintenance / development and effectively forks the release they ship. (Or, the distribution works with the upstream developers to extend their maintenance window to match the distribution’s.)
The distro rolls to a new release series.

Which of those three things happens varies from component to component, according to the level of risk presented by using the component past the upstream end-of-life date, the extent of engineering resources available among the maintainers of that component in the distribution, promises the distribution has made to its users, and other factors.

The same thing is true of shared components within applications developed and distributed independently, but distributions centralize the process of selecting shared components for common maintenance windows. In many cases, this allows for greater specialization and focus on security across the collection of software. In other cases, it can result in bugs resulting from updating a component before compatibility bugs are discovered.

(Perhaps as a side note, I will mention that I would really like to see more projects maintain a major-version branch in their source code, and I’d like to see more applications run CI tests that build the major-version branch of their dependencies to discover bugs earlier. Doing so can reduce the rate of regressions in shared components, and also makes it easier to update applications to new minor releases.)

Distributions handle non-technical tasks, such as legal compliance and licensing issues.

There are a lot of requirements imposed by the terms of licenses, and requirements imposed by copyright law, which distributions tend to take very seriously. We frequently see these requirements misunderstood or ignored by software developers.

For example, some licenses require that users receive the source code to their software. In order to meet that requirement, distributions often have to examine “source” releases to ensure that there aren’t any pre-built binary components included (which are sometimes build tools), remove the pre-built components, and then obtain and build those tools or components from source. This task can quickly become a deep chain of additional builds required to support an application.

It’s also the case that distributions are careful to preserve the text of the license that gives them the right to redistribute software. U.S. copyright law makes copyrighted works inseparable from the “Terms and conditions for use of the work,” regardless of the terms of the individual licenses, but this is a requirement often overlooked by software developers.

Distributions build software from source, in order to validate core Software Freedoms, such as the ability to modify software.

Another core function of building software from source is validating the freedoms that make up the definition of “Free Software.” Specifically, the definition of Free Software requires that users are able to modify their software. If users cannot build the software, then the right to modify it cannot be realized. So, as an ideological matter, distributions build software from source in order to ensure that users retain the right and the ability to participate in the process of software development.

Distributions may also choose to support the right and ability of users on less common platforms to participate. Some software developers choose not to build their applications for less common platforms, for various reasons from a lack of hardware to run builds a lack of QA resources to ensure that the software works on those platforms. It’s understandable that developers may not want to offer a build that they haven’t tested. It’s also understandable that developers may not want to handle bug reports for a platform they’ve chosen not to support. However, reserving the right to determine what platforms are supported is inconsistent with Free Software ideals.

Distributions provide a space and tooling for collaboration.

Because distributions are a central place where software collections are built and integrated, they can serve as a place for developers to work together on build and integration issues, where solutions can be developed for the ecosystem as a whole, rather than leaving each project to solve problems on their own.

Because distributions are a central place for users to find software, they can serve as a place for users to meet each other to discuss technical issues, mutually provide technical support, and to learn about developing and deploying Free Software systems.

Because distributions are built in the open by engineers with expertise in build, integration, and release processes, they can serve as a place for interested users to build their skills by following discussions and participating in the parts of the distribution process that they find interesting.

Distributions should minimize the difference between what they collect from upstream projects and what they distribute to end-users.

Participation is the thing that makes Free Software sustainable, so it stands to reason that systems that make collaboration with upstream developers more difficult or otherwise less likely make Free Software less sustainable.

One of the things that I think is the mark of a good distribution… one that understands Free Software culture… is that it will minimize the friction between upstream developers and users who deploy systems, and promote collaboration between the two.

There are a variety of policies and processes that distributions can adopt to promote collaboration, but most of them are some variation of, “minimize the difference between what the developers publish and what users receive.”

For example, Red Hat promotes an “upstream first” philosophy. Any change that they want to ship in their distribution should first be offered to the upstream developers for review and inclusion. Providing changes to upstream developers has security benefits, it reduces ongoing maintenance costs associated with carrying patches, but as a philosophical matter, I think the most important thing is that it is easier for users to work with upstream developers if the upstream developers understand the software the users are running. And upstream developers will understand the software that users are running best when there are as few changes made by distribution maintainers as necessary.

Furthermore, it is also the case that it is easier for users to collaborate with upstream developers when users are running software from an actively maintained release series. As we see in the diagrams above, that isn’t always the case even in rapid release systems like Fedora. But the closer you can get to running an actively supported release series, the easier it is to collaborate.

And I think that point really demands a look at LTS distributions, because as we see in the diagrams above, many common components in a distribution simply do not have the kind of maintenance windows that would cover an LTS distribution. The obvious effect is that a great deal of the software in an LTS distribution is unmaintained by its upstream developers. One of the unfortunate consequences of this is that it becomes quite difficult for users to report issues that affect their systems to upstream developers, because so many of the upstream developers have simply moved on from the release that users are running.

That’s not to say that all LTS systems are bad, but it does introduce a number of factors that should be considered when selecting an LTS system. Primarily, LTS distributions (and arguably all distributions) should be seen as a fork of a collection of upstream projects, and that means that all bug reports and other support requests should be directed to the distribution, and not to upstream projects. Distributions that create space for their users to collaborate and to participate in the maintenance of the project tend to make Free Software more sustainable, while distributions that do not accept contributions from users tend to make Free Software less sustainable overall.

Summary

A distribution is a project that distributes software.

Summary

A distribution is a project that distributes software.

Distributions reduce the complexity of the Free Software ecosystem.

Summary

The minimum version and the maximum

The stable release model allows developers to work asynchronously.

Distributions simplify the branches of components with diverse lifecycles.

Building a stable distribution out of components with diverse lifecycles requires compromises.

Distributions handle non-technical tasks, such as legal compliance and licensing issues.

Distributions build software from source, in order to validate core Software Freedoms, such as the ability to modify software.

Distributions provide a space and tooling for collaboration.

Distributions should minimize the difference between what they collect from upstream projects and what they distribute to end-users.

Similar Posts