What would happen to the world’s music collections if streaming services disappeared? One hacktivist group says it has a solution: scrape around 300 terabytes of music and metadata from Spotify and offer it up for free as what it calls the world’s first “fully open” music preservation archive.
The scraping appears to have been carried out by people associated with Anna’s Archive, a shadow-library site that focuses on preserving media - traditionally books and academic papers - by aggregating metadata and distributing large datasets rather than directly hosting copyrighted works. In practice, Anna’s Archive functions more like a metadata search engine, allowing users to find the content they want and connecting them with downloads, usually via torrent, from other sources to reduce legal …
What would happen to the world’s music collections if streaming services disappeared? One hacktivist group says it has a solution: scrape around 300 terabytes of music and metadata from Spotify and offer it up for free as what it calls the world’s first “fully open” music preservation archive.
The scraping appears to have been carried out by people associated with Anna’s Archive, a shadow-library site that focuses on preserving media - traditionally books and academic papers - by aggregating metadata and distributing large datasets rather than directly hosting copyrighted works. In practice, Anna’s Archive functions more like a metadata search engine, allowing users to find the content they want and connecting them with downloads, usually via torrent, from other sources to reduce legal liability.
In a Saturday blog post, the group said it couldn’t pass up an opportunity "outside of text" to scrape Spotify at scale, claiming to have archived roughly 86 million music files, which it says account for about 99.6 percent of listens on the platform.
"A while ago, we discovered a way to scrape Spotify at scale," Anna’s Archive said. "We saw a role for us here to build a music archive primarily aimed at preservation."
Anna’s Archive justified its Spotify scraping by describing it as a "humble attempt to start a ‘preservation archive’ for music" in order to protect "humanity’s musical heritage" from "destruction by natural disasters, wars, budget cuts, and other catastrophes."
In particular, the Anna’s Archive team said that it wants to get around some of the most common problems in other music preservation initiatives, namely an "over-focus on the most popular artists," an "over-focus on the highest possible quality" and a lack of authoritative torrent lists representing "all music ever produced."
Those noble claims fall apart quickly upon further reading of Anna’s Archive’s blog post, though.
While 300 TB comprising roughly 86 million music files, which the group claims represent about 99.6 percent of Spotify’s listens, is a vast amount of audio, it falls well short of the platform’s full catalog. Anna’s Archive says Spotify contains around 256 million tracks in total, meaning the audio files it archived cover only about a third of the catalog, with the remaining tracks represented only in metadata rather than preserved as music files.
By not bothering with all the musical chaff in Spotify’s catalog, the Anna’s Archive team is apparently content to let those less popular songs languish despite their claim to want to avoid focusing on just the most popular artists.
It’s not clear how the Archive intends to break up the 300 TB worth of music into torrent files, or if it intends to release one massive file (we reached out to the team but didn’t hear back), but the blog notes that it’s only going to be making it available via "a torrents-only archive aimed at preservation" that "can easily be mirrored by anyone with enough disk space."
In short, the archival goal is one that butts up against letting users simply download the individual Spotify tracks they want, as that would just be plain old piracy.
As with its broader preservation rhetoric, Anna’s Archive’s claim of benevolence in releasing the collection strictly for archival purposes is undercut later in the blog post.
"If there is enough interest, we could add downloading of individual files to Anna’s Archive," volunteer member "ez" wrote on the blog. "Please let us know if you’d like this."
- Anti-piracy messaging may just encourage more piracy
- AI slop hits new high as fake country artist goes to #1 on Billboard digital songs chart
- Denmark takes a Viking swing at VPN-enabled piracy
- Lawyer’s 6-year-old son uses AI to build copyright infringement generator
If Anna’s Archive intended to return to the Spotify servers to scrape the remaining songs, it appears they might be too late, according to a Spotify spokesperson’s comment to *The Register. *
"Spotify has identified and disabled the nefarious user accounts that engaged in unlawful scraping," the company told us. "We’ve implemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behavior."
No mention was made by either Spotify or Anna’s Archive of how the scrapers managed to bypass Spotify’s digital rights management software.
While Spotify didn’t respond to questions about Anna’s Archive’s supposed preservation motivations, the company did note that it views the theft of tens of millions of pieces of intellectual property from its servers as a simple act of piracy, regardless of whether Spotify itself is a bit of an IP pirate that doesn’t fairly pay its artists.
"Since day one, we have stood with the artist community against piracy, and we are actively working with our industry partners to protect creators and defend their rights," Spotify said.
For now, metadata covering nearly all of Spotify’s roughly 256 million tracks is available to download from Anna’s Archive. The music files themselves aren’t out yet, but the Archive claims that it’s planning to release them - in order of popularity - sometime in the future. ®