This blog post is based on a presentation at the 2025 Midwest Archives Conference and a forthcoming paper at iPRES 2025.
At the University of Kentucky Libraries’ Special Collections Research Center, I have the good fortune as the Digital Archivist to work almost exclusively with born-digital material across our many collections. Sometimes the other archivists let me touch paper, but I have to ask nicely. As with physical archives, most digital archivists are just trying to keep pace with new materials as they arrive. The work necessary to adapt to new file formats and platforms in all their esoteric glory is enough to keep anyone busy. Which is probably why, when I asked colleagues at other institutions if they could share any policies and practices for born-digital reappraisal, th…
This blog post is based on a presentation at the 2025 Midwest Archives Conference and a forthcoming paper at iPRES 2025.
At the University of Kentucky Libraries’ Special Collections Research Center, I have the good fortune as the Digital Archivist to work almost exclusively with born-digital material across our many collections. Sometimes the other archivists let me touch paper, but I have to ask nicely. As with physical archives, most digital archivists are just trying to keep pace with new materials as they arrive. The work necessary to adapt to new file formats and platforms in all their esoteric glory is enough to keep anyone busy. Which is probably why, when I asked colleagues at other institutions if they could share any policies and practices for born-digital reappraisal, they looked at me as though I had asked whether they’d considered having their ashes spread in space.
That is to say, most of us don’t have the resources, and it’s perhaps too early to be asking.
Caption: The answer: yes you can. Or someone can, anyways.
Also, arguably, I didn’t know them well enough to ask either question. I am beginning to suspect that asking someone about their backlog is a poor icebreaker. Unofficially, that is probably a question for the second beer.
To be honest, I began a reappraisal process out of the same morbid sense of curiosity that might lead to a hypothetical space interment question. What, exactly, was on the other side of the darkness? And how big was it?
Step 1: Inventory Physical Media
My first encounter with archivists systematically assessing their born-digital backlog was at a 2023 iPRES presentation by Leo Konstanelos and Emma Yan from the University of Glasgow’s Archives and Special Collections. They presented their Media Prioritisation Tool, a spreadsheet tool that determines what disks and media should be migrated first based on their relative fragility and vulnerability to time and circumstances.
Here’s their Media Prioritisation Paper: https://www.ideals.illinois.edu/items/128315 and their marvelous tool: https://researchdata.gla.ac.uk/1634/
A challenge inherent in such a large project is to define the order in which to tackle the material. The Glaswegian tool distills a few factors down to a prioritisation score for each piece of media: an elegant and straightforward device.
Caption: The original media prioritisation tool from the University of Glasgow, https://researchdata.gla.ac.uk/1634/; licensed under CC BY-NC-SA 4.0
My obvious reaction was to approach our Director of Manuscripts, Megan Mummey, and say, “Hey, we should complicate this.” And so we did.
Our adapted version of their Media Prioritisation Tool adds an additional score: Estimated Archival Research Value (EARV). The reason, after we inventoried our collections, it was clear that nearly everything was in the two most critical categories. In order to differentiate their priority, we weighted their scores further based on their probable research value. I’ll spare you the full breakdown, but you can see the categories and how they were added to the inventory tool here:
Template_Inventory_With_EARV.xlsx (https://docs.google.com/spreadsheets/d/1WRyExoPCIcpBj4JVmX1ak7vUamKuJCET/copy)
Feel free to make your own copy, adapt it, and tweak scores and categories that best fit your priorities.
Step 2: Address the Complications in Step 1
One challenge in inventorying our materials was identifying terms used to describe digital material over the last 30 years. I look back on the early days of external flash memory with an alarming degree of affection. However, it traveled by many names: jump drive, flash drive, memory stick, USB drive, thumb drive. These and other fluctuating naming conventions hindered the search. Eventually, I compiled 33 search terms that captured that vocabulary, but understanding your own repository’s practices is crucial in conducting such an inventory.
In our case, most collections prior to 2015 commingled disks with documents, sometimes with A/V materials, sometimes foldered, sometimes in standalone boxes. Searching through inventories, deeds of gift, and accession and resource records was our only realistic option because there were too many physical locations where material was hidden.
Caption: True story: One disk in a legal-sized, acid-free box
Step 3: Reappraise Previously Transferred Files
In addition to files trapped in media on our shelves, we had migrated considerable data onto network servers in less consistent ways prior to 2015. Those materials needed reappraisal, as our policies have evolved considerably, and in some instances material had been migrated to servers but never described or pushed to long-term digital preservation. Reconciling these gaps is crucial.
While some of this work was necessarily manual, the storage analysis software TreeSize allowed for large-scale automated reappraisal. We already use TreeSize as a standard part of digital accessioning and generating file manifests, but one of my favorite features is its search module that, among other things, is excellent for hunting down duplicate folders and files across multiple servers at once. It helped me hunt down terabytes of duplicate materials that were clearly remnants of earlier processing or accidental accruals.
Caption: Sample TreeSize duplicate file search within a directory. Note, duplicate files with different names are still found thanks to the software’s checksum comparisons.
In addition, TreeSize can analyze AWS S3 buckets, SharePoint sites, Google Drive accounts, linux servers. I have been accused of loving their software a little too earnestly, but I feel no shame.
Step 4: Get Smug
Just kidding, obviously. Nobody likes smug. I wrote this from the humble vantage point of someone who has inventoried our backlog with my colleagues, deeply aware that further surprises lurk on our shelves, even as fusillades of new born-digital materials whistle down from the clouds. Also, knowing your backlog and dealing with it are two discrete problems. I’m around 10% done addressing the backlog I’ve defined, and the work ahead is considerable.
But the darkness has a shape now. Contours, if not a full outline. And having an idea of what’s out there makes it a little less scary.
Andrew McDonnell is the Digital Archivist for the University of Kentucky Special Collections Research Center. Prior to arriving at UK, he worked in the archives of the University of Wisconsin, the Wisconsin Center for Film and Theatre Research, Pixar’s Living Archives, and Wayland Academy (where he once found a 100-year-old lollipop in a scrapbook and did not eat it).