In early 2025, US federal websites were undergoing rapid changes with the new presidential administration. In February 2025, we analyzed the rate of page deletions across administrations, and further analyzed the content of changed CDC pages in summer 2025. One surprising thing that we found was that while CDC pages contained a “last updated” date, many of the dates were incorrect: pages contained silent, unannounced updates, generally related to relevant executive orders.
How does trustworthiness change over time?
Silent, unannounced changes that do not match the last modified date of a webpa…
In early 2025, US federal websites were undergoing rapid changes with the new presidential administration. In February 2025, we analyzed the rate of page deletions across administrations, and further analyzed the content of changed CDC pages in summer 2025. One surprising thing that we found was that while CDC pages contained a “last updated” date, many of the dates were incorrect: pages contained silent, unannounced updates, generally related to relevant executive orders.
How does trustworthiness change over time?
Silent, unannounced changes that do not match the last modified date of a webpage reduce the trustworthiness of the page. We can study three different change markers: the HTTP last-modified property, meta tags that contain last updated information, and text on the page that is viewable to the user containing update information. These markers are listed in order from most machine readable with standard syntax to most human readable with flexible syntax. The CDC webpages do not have the HTTP last-modified property, but have the other two markers. While the server should automatically update the HTTP last-modified property, the other two properties could have varying levels of automation, from being linked to updates in a content management system to being manually updated by a human. A page with content edits that are always reflected in its last updated markers is trustworthy. A page that has content edits that do not match its last updated markers is less trustworthy. Since we can use web archives to find these changes, we can analyze the trustworthiness gaps of the last updated dates, how often the gaps occur, and if that rate changes over time.
In February 2025, we analyzed the changing rate at which webpages were deleted on US government websites. We found that different presidential administrations have completely different webpage deletion rates, with higher deletion rates correlating with Republican presidential administrations since 2008. Similarly, Tsoukaladelis et al. analyzed silent, unannounced changes on news article webpages in 2022 and detected a correlation between the Allsides media bias score (both left and right) and amount of silent changes by publisher.
Future work in this area will (1) determine a baseline for silent changes on government websites by administration, (2) determine baselines for the news publishers identified by Tsoukaladelis et al. to examine how the silent update rate changes over time, and (3) identify a third type of webpage exhibiting this phenomenon to analyze the change over time as well.
What features contribute to trustworthiness, and how can web archives currently be used to further support or refute trustworthiness?
Figure 2 shows our change presentation continuum. Each website can be categorized based on both its initial properties presented on the live web, as well as from additional captures available on web archives. Table 1 shows an example of each type of change on the continuum along with a description of each type of change presentation.
Change presentation (most trustworthy to least)
Explanation
Example
Past versions
All past versions of the webpage are available to view, giving the user the highest level of trust
Wikipedia
All past versions are available for every page
Updates summary
The most recent version of the webpage is available to view, along with a dated list of updates
https://web.archive.org/web/20040128055949/http://immortalised.net/lupdates.html List of changes with dates the changes were made
Update summary
The most recent version of the webpage is available to view, along with a dated description of the most recent update
https://www.merriam-webster.com/dictionary/quixotic Last update date and sentence describing change
Update date correct
The webpage contains a date representing the most recent update, but no update summary
https://www.cdc.gov/winter-weather/safety/index.html Last update date but no information on what was updated. A web archive can be used in order to verify the correctness of the date
Update date incorrect
The webpage has been changed more recently than the update date, negatively affecting trustworthiness
Copyright date
The webpage contains a copyright date, which could be used to infer a most recent change date
https://www.fairfaxcounty.gov/topics/copyright-privacy The only date on the webpage is the copyright date, which is different from the current year, inferring no changes since the new year.
No date
There is no date information anywhere on the webpage, giving the user no information about any changes that have occurred
https://info.cern.ch/hypertext/WWW/TheProject.html There is no date information on the webpage.
Table 1: Change presentation continuum examples based on live web presentation
As shown in the two examples below, web archives can be used to either improve or deteriorate a page’s trustworthiness rating. This means the rating of any site can change more towards either of the extreme ends of the continuum, by using web archives to provide additional captures.
Example 1: more trustworthy: Rakuten Viber Messenger’s Terms of Service includes both a last updated date and a summary for the most recent update. Based on these live web characteristics, it would be labeled “Update summary” on the continuum. The Wayback Machine contains a few captures each month of this webpage. After examining captures in 2025, we can conclude this page is updated a few times a year. Using the additional information from the captures in web archives, we could increase the trustworthiness level from “Update summary” to “Updates summary” for this webpage.
The current change presentation wording (update summary level) is “Last updated: October 21, 2025. We’ve recently updated our terms and policies. View the summary of changes here.” Using web archives, this could be expanded to include more change information, specifically an updates summary list, which is a higher level of trustworthiness.
- Updated October 21, 2025 (diff). View the summary of changes here.
- Updated May 22, 2025 (diff). View the summary of changes here.
- Updated March 24, 2025 (diff). View the summary of changes here.
**Example 2: less trustworthy: **CDC webpages contain a last updated date, as shown in Figure 1. The initial level for these webpages based on the live web would be “Updated date correct.” In our work, "Coming Back Differently: An Exploratory Case Study of Near Death Experiences of Webpages," we showed that the last updated dates on CDC webpages were inaccurate: words on the pages not in compliance with early 2025 executive orders were removed without updating the last updated date. Therefore, by using web archives, the trustworthiness of these webpages has decreased to the “Updated date incorrect” category.
How could web archives be further used to detect and reconstruct trustworthy edit histories?
In the continuum shown in Figure 2, the most trustworthy level is categorized as past versions, and an example website meeting this level is Wikipedia. In order to guide our work of using web archives to amplify the trustworthiness (or lack thereof) of a page’s change presentation, we surveyed the features of the edit histories shown to users on Wikipedia and how researchers used those features to guide their work. We examined peer-reviewed publications from 2018 - 2025 that contained the phrase “wikipedia edit history.”
- Authors: Researchers used authorship information about which articles the same author edited, parse edit conflicts on a page, count edits, track the frequency of the edits of an author over time, verify the trustworthiness of individual authors, and also used IP addresses as a proxy for author data of anonymous editors.
- Edit properties: Researchers used page edit counts, the “minor” flag, and the time of the edit in their work.
- Filtering: Researchers filtered the data (every edit on Wikipedia since inception) by single article, article subject, time of edit, and tag.
- View: Researchers viewed the data as a graph, tuple, time series, or as text (for natural language processing or large language model training).
- Researchers also used the edit history data to follow redirects and to identify vandalism.
- Data: about half of researchers used a derived, cleaned data set and the other half used either Wikipedia dumps or other raw downloads.
Clearly, Wikipedia edit history is extremely useful to researchers who are looking for examples of a variety of edit types. So, why would researchers have a need for web archives when so much Wikipedia edit history data is already available to them? The answer is, not every research would need web archives, but some would. It depends on what type of changes the researcher needs examples of for their work. There are three pathways: cases where Wikipedia edit history contains information not found in web archives, such as when author information is needed; cases where either Wikipedia or a web archive could suit the needs of the researcher, in which case the cleaned and semantic data from Wikipedia would probably be more suitable; and finally, cases where the data in a web archive would be more suitable than Wikipedia. Faruqui et al., and others, have shown that the language on Wikipedia is different than in other contexts, so this is a good starting point for coming up with additional web archive-preferred uses.
Conclusion
We found that websites are communicating inaccurate last updated dates, which affects their trustworthiness. We and other researchers have found that the change rates of silent updates changes over time. We enumerated levels across a change presentation continuum on the live web, and showed how web archives can be used to provide further evidence for or against a webpage’s trustworthiness in this manner. We conducted a literature review of Wikipedia edit history use cases, and used that to start informing how web archives can be used to detect and reconstruct edit histories in a way that will be useful to researchers.
References
- Frew et al. Establishing a Baseline by Administration for the Takedown of US Government Webpages using Web Archives.
- Frew et al. Coming Back Differently: An Exploratory Case Study of Near Death Experiences of Webpages.
- Tsoukaladelis et al. The Times They Are A-Changin’: Characterizing Post-Publication Changes to Online News.
- Faruqui et al. WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse.
-Lesley