Welcome sign at the University of Siegen
At this year’s REsearch infrastructure for the Study of Archived Web materials (RESAW) conference, my colleague Pierre Marshall and I organised a workshop titled “Towards an “Algorithmic Archive”: Developing Collaborative Approaches to Persistent Social and Algorithmic Data Services for Researchers”. The workshop was accepted as one of the RESAW2025 pre-conference workshops and took place on 4 June 2025. We had around twelve participants, including researchers at various career stages and web archivists, that contributed to lively discussions and to the Algorithmic Archive project thanks to their experience with social media data.
The workshop was articulated in two sessions: the f…
Welcome sign at the University of Siegen
At this year’s REsearch infrastructure for the Study of Archived Web materials (RESAW) conference, my colleague Pierre Marshall and I organised a workshop titled “Towards an “Algorithmic Archive”: Developing Collaborative Approaches to Persistent Social and Algorithmic Data Services for Researchers”. The workshop was accepted as one of the RESAW2025 pre-conference workshops and took place on 4 June 2025. We had around twelve participants, including researchers at various career stages and web archivists, that contributed to lively discussions and to the Algorithmic Archive project thanks to their experience with social media data.
The workshop was articulated in two sessions: the first one focussed on gathering researchers’ perspectives and information about the use of social media data in research. The second session invited participants to imagine a long-term archive of social media data, asking them to think about the features they would like to see in a social media data service. Both sessions offered a valuable opportunity to gather insights for the Algorithmic Archive project, particularly regarding issues and expectations related to short- and long-term access to social media data.
Key themes and takeaways are summarised below.
Social media data (re)use and data management practices
Researchers appeared to work mostly with small datasets, especially after free access to data for research purposes came to an end with the deprecation of the Twitter Academic API in 2023. Among the researchers that shared their experience with social media data, one noted how they currently work with information about the number of followers, which is often supplemented with screenshots taken at different points in time. They explained how screenshots are essential for their research as enable to capture the “look and feel” of the social platforms, which is an essential part of the research they are conducting. In this regard, one of the web archivists participating in the workshop noted how at their institution they use Webrecorder[1] at least once a year.
In addition, a researcher whose research focussed on algorithms, noted that social media data collected via APIs is only one of the sources they use for their study. Other sources include existing policies, new regulations (e.g. EU Digital Services Act) and other archival sources such as information on GitHub.[2]
As for long-term preservation, researchers participating in the workshop appeared to not have specific plans in this sense, with some indicating that they usually delete social media data sometime after the end of the project. Despite some concerns related to potential ethical issues, researchers expressed a general interest in reusing datasets that include social media data. Nevertheless, they emphasised that effective reuse would require detailed documentation from the dataset creator to understand how the data was developed.
Access and user requirements
For the second session, we organised a post-it note exercise where we asked researchers to reflect on the type of metadata they would find useful for their research and would like for memory institutions to collect/provide. Researchers suggested several metadata or information they would like to see associated with the archived resource, includingdate of capture; date of publication; technical and curatorial metadata; hardware (e.g., mobile, tablet, laptop); sensitivity assessment; andthe type of tool used to collect the information.
Post-it note session
There was a general agreement among participants about the need for the collecting institution to preserve at least some instances of the context in which the data was embedded. For example, walkthroughs of social media platforms recorded using tools such as Webrecorder, would be crucial for researchers and future users of the collection to get a sense of platforms’ “look and feel” at certain points in time.Some of the participants noted the importance to get an understanding of potential **functionality loss **when replaying archived social media material.
Nevertheless, Access, particularly free access to platform data is still one of the major blocks for researchers who need such information for their studies. This has become even more crucial after the Twitter Academic API was deprecated in 2023 and replaced with a paid tier system which due to the high fees required to get access to the required amount of data, has often led many researchers to redirect their research goals, either significantly reducing the amount of data needed or focus on other platforms.
Overall, the workshop brought together diverse perspectives from practitioners and researchers working with social media data, fostering discussions regarding the development of sustainable strategies to collect social media platforms. This was a unique opportunity to discuss some of the Algorithmic Archive findings, clarify researchers’ perspectives on concerns related to the use of social media data as well as raise further questions that the Algorithmic Archive project should take into consideration for the development of a social media data service.
[1] Webrecorder homepage: https://webrecorder.net/
[2] More information about the GitHub Archiving Programme can be found here: https://archiveprogram.github.com/