It’s not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.
Charles Darwin
Part 1 | Open Infrastructures: Why they matter for long term access
Asymmetric Risk
The emergence of cloud hyperscalers has introduced significant asymmetry into the online world. Both content and processing capacity are heavily concentrated in a few organisations, which both monopolise the market for resources and services and, to a large extent, allow these organisations to act as gatekeepers to much of the online world. This concentration allows them to benefit from economies of scale and offer services at a fraction of the cost of sma…
It’s not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.
Charles Darwin
Part 1 | Open Infrastructures: Why they matter for long term access
Asymmetric Risk
The emergence of cloud hyperscalers has introduced significant asymmetry into the online world. Both content and processing capacity are heavily concentrated in a few organisations, which both monopolise the market for resources and services and, to a large extent, allow these organisations to act as gatekeepers to much of the online world. This concentration allows them to benefit from economies of scale and offer services at a fraction of the cost of smaller providers. Yet, this concentration also represents a significant risk, as the failure of a hyperscaler can threaten the availability of a large portion of the online world.
In recent months, we have seen significant failures fromAmazon,Microsoft andCloudflare (whose content delivery network drives a significant portion of the internet), which emphasises this fragility. If anything, such large-scale failures seem to be becoming more common, possibly as botnets and AI crawlers put more pressure on internet infrastructure at large. Not to be outdone, in 2024,Google accidentally deleted the account of UniSuper, an Australian pension fund that manages $135 billion worth of funds and has 647,000 members. All their servers, backups and data replicas are gone, as a result of human error.
For memory organisations, these examples should be a salutary lesson. In risk terms, multiple copies of data with one provider might as well be a single copy (just more expensive). This is evident when examining the terms and conditions of any service agreement with a hyperscaler. While there are penalties for the loss of service by a provider, they don’t cover the potential cost of that loss to the customer. For an organisation using these services for digital preservation, without additional provision, such a loss could be an existential threat.
Open is about Control
So, how can openness help mitigate such risks when resource constraints push you towards cloud providers?
In essence, given the risks and limitations noted above, the aim is to minimise the cost of the additional provision needed to mitigate the risks associated with a single provider.
This can be further broken down, by ensuring that:
-
data can be duplicated across several providers with minimal overhead
-
data can be transferred to a new provider with minimal overhead when an existing provider ceases to be effective
-
essential tools and services needed for digital preservation AND access can operate across multiple providers
Incorporating these requirements into supplier engagements allows an organisation to retain control over its data. In many ways, this characterises a “good” service provider as they need to make themselves “expendable”, which also embodies a certain level of re-empowerment of the customer in the relationship. Proprietary lock-in is just a single point of failure with limited scope for mitigation.
So what does this have to do with openness?
For the first two items, data duplication and transfer, we are concerned about interoperability between providers through the adoption of shared, open standards. Open, because other suppliers (commercial or otherwise) should be free to be interoperable and thus be candidates for adoption in the future. Again, this is about giving the customer greater choice and control in their interactions with providers. In technical terms, this involves using open standards for data and metadata file formats, and their logical arrangement within the larger digital object that they pertain to (often characterised as the Archival Information Package). However, these requirements also imply the use of open APIs, so that data can be moved efficiently between providers using automated machine-to-machine interactions that do not require significant human intervention.
The final point is about ensuring that essential elements of a digital preservation workflow are portable between systems so that collections can be consistently managed over time without extensive re-processing, or even re-ingest, when material is migrated between systems. I would suggest that fixity and file format validation and characterisation (against both specifications and organisational policies) are two areas where this is important.
Checksum algorithms are well defined and widely supported, so there is generally little issue with the portability of fixity metadata between systems. However, validation and characterisation tools, and the registries/databases that power them, are rather more diverse, especially with respect to many emergent standards, where a formal specification describes software behaviour rather than vice versa (e.g. Microsoft Office). Here, both open software and registries are needed to enable these functions to be implemented by multiple vendors or systems, to provide consistency between systems in these important areas.
Thus, openness, with respect to certain parts of the digital preservation workflow, can simplify the migration of material between systems. Lowering the cost and effort of migration in this way lessens the risks associated with system lock-in, giving organisations greater flexibility and control over their operations and data.