There’s outrage in the computer science community over a new feature rolled out by the ACM Digital Library that generates often inaccurate AI summaries. To make things worse, this is hidden behind a ‘premier’ paywall, so authors without access (for example, having graduated from University) can’t even see what is being said.
Why are these paper AI summaries harmful?
The summaries themselves are deeply average. Looking at one of my recent papers, it somehow expands a carefully crafted two paragraph summary into a six paragraph thing that says roughly the same thing. This seems like exactly the wrong place to apply LLM technology to as it’s replacing a carefully peer-reviewed paragraph with a longer slopful ve…
There’s outrage in the computer science community over a new feature rolled out by the ACM Digital Library that generates often inaccurate AI summaries. To make things worse, this is hidden behind a ‘premier’ paywall, so authors without access (for example, having graduated from University) can’t even see what is being said.
Why are these paper AI summaries harmful?
The summaries themselves are deeply average. Looking at one of my recent papers, it somehow expands a carefully crafted two paragraph summary into a six paragraph thing that says roughly the same thing. This seems like exactly the wrong place to apply LLM technology to as it’s replacing a carefully peer-reviewed paragraph with a longer slopful version.
The AI generated summary regresses us to the mean by turning two paragraphs into six.
The ACM stands for the dissemination of knowledge accessibly. I could imagine cases where summarising abstracts would be useful: for example, into foreign languages for which no such abstract exists, or really nice audio transcriptions for assistive usage. However, putting it behind a paywall and distracting from peer-reviewed human-created content is really, really bad.
Is the ACM trying to make money from AI?
I dug in a bit deeper to find out more, and discovered this statement:
Currently, we offer written summaries of individual articles as well as podcast-style audio summaries of conference sessions. We will soon add chat-style interactivity to our content and search functionality. All summaries on the Digital Library are clearly labeled as AI-generated. When citing ACM content, authors should always cite the original article, not an AI-generated summary.
AI can make mistakes, and it cannot replace the experience of reading an article in full. But we do believe it will help you find, understand and use ACM content both more quickly and more deeply.
These tools were designed in consultation with a diverse group of Digital Library stakeholders and will continue to evolve as Artificial Intelligence advances. We are continuously tuning our Foundational Model to optimize readability and we conduct regular audits for hallucinations and other errors. We are very interested in your thoughts and suggestions- please leave them by clicking the "Feedback" button on the far right of this page. If you find a problem with a specific AI-generated summary, please return to that summary and click the Feedback there. Artificial Intelligence Tools in the ACM Digital Library, undated.
I have many questions here: who is this diverse group of stakeholders, what foundation model is being used, what tuning happened, what audits, and what is happening with the corrections from authors. Are we suddenly using the world’s scholars to create a fine-tuning label database without their permission? There’s a definite lack of transparency here.
Luckily, Jonathan Aldrich is on the ACM Publications Board, which must be a thankless job. He acknowledged this very graciously yesterday during the outrage:
I also owe the community an apology; I was told about this feature (though I’m not sure I was told it was going to be the default view). I should have recognized the potential issues and been loudly critical earlier, before it went live. But I will do my best to get it fixed now. Jonathan Aldritch, Mastodon sigsocial, 17th Dec 2025
This got me thinking about what the ACM should be doing instead of this. Putting these abstracts up represents not only a step in the wrong direction, but also a high opportunity cost of not unlocking some other positive activity that can leverage AI for social good.
How the ACM could do AI right
We’re at a real crossroads with scientific communication and scholarly publishing, but I firmly believe that the ACM can correct itself and make a real difference.
Less algorithmically driven communication
Looking through the ACM digital library footer, I see news channels using X, LinkedIn and Facebook. The only open protocol listed is email, although I did discover an (unlisted on the ACM website) Bluesky account.
None of these platforms are conducive to longform, thoughtful community conversations. Let’s look at ACM’s mission statement:
ACM is a global scientific and educational organization dedicated to advancing the art, science, engineering, and application of computing, serving both professional and public interests by fostering the open exchange of information and by promoting the highest professional and ethical standards. – ACM’s Mission, Vision, Core Values and Goals, 2025
The platforms the ACM has chosen for communicating with the scholarly community are algorithmic engagement factories. There are countless papers on the ACM Digital Library itself recording the harms and spread of misinformation.
While early ads were found to be effective in creating brand awareness and positive attitudes, recent Internet advertising has been described as nonsensical, uninformative, forgettable, ineffective, and intrusive. – The Effects of Online Advertising, Communications of the ACM, 2007
Instead, the ACM should focus on using models that encourage scholarly discourse, such as standards-based mechanisms like an Atom/RSS feed for their news (which can be consumed widely and accessibly), and consider increasing engagement on non-advertising-driven platforms such as Bluesky. A poll at the start of the year from Nature revealed that 70% of respondents use that platform. I suspect it’s less for computer science, but the ACM setting a direction would also go a long way to give community direction.
The W3C Atom Feed Validator doesn’t get very far with the ACM Digital Library
Make papers easier to download
I’ve been working on collective knowledge principles to boost the conservation evidence project. As part of this process, I’ve downloaded tens of millions of fulltext papers to help figure out where living things are on the planet. By far the most difficult task here was getting access to even the open papers. At the recent COAR meetup, half the conversations were around the difficulty of obtaining knowledge even before curation.
Incredibly, while just browsing around the ACM DL in order to research this article, I got blocked from the entire library. This was after opening about 10 browser tabs: not an unusual amount of human traffic!
I’m still blocked an hour later, so I guess I won’t be doing any computer science research for the rest of the day. Pub, anyone?
Contrast this to the Public Library of Science (PLOS), which has a allofplos repository that allows me to download the entire fulltext paper repository by running a single line of Python: pipenv run python -m allofplos.update. The script not only downloads papers, but does a bunch of important bookkeeping:
The script:
- checks for and then downloads to a temporary folder individual new articles that have been published
- of those new articles, checks whether they are corrections (and whether the linked corrected article has been updated)
- checks whether there are VORs (Versions of Record) for uncorrected proofs in the main articles directory and downloads those
- checks whether the newly downloaded articles are uncorrected proofs or not after all of these checks, it moves the new articles into the main articles folder. – AllOfPLOS README, 2016
PLOS has been doing for years, so why hasn’t the ACM done this yet for its open access papers? I applaud the ACM’s recent shift to open access by default but this is pointless if not accompanied by an investment in the dissemination of knowledge.
Build a provenance defence against fake papers
One of the most exciting things about Bluesky is that it allows for the reuse of the identity layer to build other services. Right now we are seeing that AI poisoning of the literature is upending all kinds of evidence-driven social norms, and is a huge threat to rational policymaking for all of society.
The ACM is uniquely positioned in computer science as the body that could build a reasonable reputation network that not only identifies academics, but also enforces provenance tracking of whether papers and artefacts did in fact follow a rigorous methodology. LLMs are now amazingly good at constructing fake papers, and so capturing the peer review process and building up a defence against "knowledge from thin air" will be one of the great challenges for the remainder of this decade.
AI poisoning the literature in a legendary cartoon. Credit: David Parkins, Nature
Agentic AI is here to stay, so deal with it on our terms
My December adventures in agentic programming have been eyeopening in just how quickly I can build hyper-personalised views on a large body of knowledge. While many computer science scholars tend to view LLMs skeptically, there is a good use of agentic summaries of papers: by allowing readers to summarise papers directly for themselves when supplying the LLM with other information about what they already know.
Bryan Cantrill explained this best in his principles for LLM Usage at Oxide. He separates out using LLMs for reading, writing and coding. I totally agreed with him that I detest people sending me LLM-generated writing, but he teased out a good explainer as to why:
LLM-generated prose undermines a social contract of sorts: absent LLMs, it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion. (That is, it is more work to write than to read!) For the reader, this is important: should they struggle with an idea, they can reasonably assume that the writer themselves understands it — and it is the least a reader can do to labor to make sense of it. – Using LLMs at Oxide, RFD0576, Dec 2025
And that, dear reader, is why the ACM redistributing AI summaries is a bad idea. It breaks the social contract with the reader that the ACM Digital Library is a source of truths which the scholars who contributed it do understand. We might not agree with everything on the library, but it’s massively dilutive to have to sift through AI-generated writing to get to the original bits.
If the ACM itself deliberately introduces errors into its own library, that’s a massive self-own. Instead, if the ACM Library exposed a simple text-based interface that allowed my agents to do the papers summaries just for me, then that personalisation makes it useful. I find deep research agents surprisingly useful when exploring a new field, but primarily because I can guide their explorations with my own personal research intuition, not someone elses.
My appeal to the ACM: don’t regress to the mean
My appeal to the ACM is to not try to build differentiated paid services using AI. Let the rest of the profit-driven world do that and peddle their slop; at least they are earning money while doing so (hopefully)! The ACM needs to be a force for creative disruption and discovery, and help defend and nurture the joys inherent in computer science research.
This means taking a critical view at how AI is impacting all aspects of our society, but not just rolling out bland services: instead, deploy AI technologies that enhance the human condition and allow us to be even more inquisitive with our time on earth. The recent Royal Society meeting on Science in the Age of AI put it very well:
A growing body of irreproducible studies are raising concerns regarding the robustness of AI-based discoveries. The black-box and non-transparent nature of AI systems creates challenges for verification and external scrutiny. Furthermore, its widespread but inequitable adoption raises ethical questions regarding its environmental and societal impact. Yet, ongoing advancements in making AI systems more transparent and ethically aligned hold the promise of overcoming these challenges. Science in the Age of AI, Royal Society, 2024
This is a well balanced view, I feel. There are huge challenges ahead of us, but also huge opportunities for new discoveries!
In the meanwhile, I remain blocked from the ACM Digital Library for unknown reasons, so I guess it’s time to start the Pembroke Christmas feasting a few hours early! Anyone want to head down to the pub now?