Professor Adrian Barnett. Credit: QUT.
A new machine learning tool has identified more than 250,000 cancer research papers that may have been produced by so-called "paper mills." Developed by QUT researcher Professor Adrian Barnett, from the School of Public Health and Social Work and Australian Center for Health Services and Innovation (AusHSI), and an international team of collaborators, the study, published in The BMJ, analyzed 2.6 million cancer studies from 1999 to 2024.
The study, "Machine Learning-Based Screening of Potential Paper Mill Publications…
Professor Adrian Barnett. Credit: QUT.
A new machine learning tool has identified more than 250,000 cancer research papers that may have been produced by so-called "paper mills." Developed by QUT researcher Professor Adrian Barnett, from the School of Public Health and Social Work and Australian Center for Health Services and Innovation (AusHSI), and an international team of collaborators, the study, published in The BMJ, analyzed 2.6 million cancer studies from 1999 to 2024.
The study, "Machine Learning-Based Screening of Potential Paper Mill Publications in Cancer Research: Methodological and Cross-Sectional Study," found more than 250,000 papers with writing patterns similar to articles already retracted for suspected fabrication.
"Paper mills are companies that sell fake or low-quality scientific studies. They are producing ‘research’ on an industrial scale, and our findings suggest the problem in cancer research is far larger than most people realized," Professor Barnett said.
Selling authorships and entire ready-made research papers, paper mills often use recycled text, awkward phrasing or fabricated data and images.
"Most likely, they’re relying on boilerplate templates which can be detected by large language models that analyze patterns in texts," Professor Barnett said.
He and his team trained a language model called BERT to recognize the subtle textual "fingerprints" that repeatedly appear across known paper-mill products.
When tested on verified examples, the model correctly identified suspicious papers 91% of the time.
"We’ve essentially built a scientific spam filter," Professor Barnett said.
"Just like your email system can spot unwanted messages, our tool flags papers that match the writing style and structure we see in retracted, fraudulent work."
Key findings from the large-scale analysis include:
- Flagged papers have increased dramatically over two decades, rising from around 1% in the early 2000s and peaking at over 16% in 2022.
- The issue affects thousands of journals across major publishers, including high-impact titles.
- The problem is most concentrated in fields such as molecular cancer biology and early-stage laboratory research.
- Some cancer types, including gastric, liver, bone and lung cancer, show especially high rates of suspicious papers.
Three scientific journals are already piloting the tool as part of their editorial screening. It will allow editors to identify potentially fabricated manuscripts before they are sent for peer review.
The team plans to expand the tool to other fields of research and improve the model as more confirmed cases of paper-mill activity become available. They stress the findings are not confirmed cases of research fraud and should be checked by human specialists.
"Cancer research influences clinical trials, drug development and patient care," Professor Barnett said.
"If fabricated studies make their way into the evidence base, they can mislead real scientists and ultimately slow progress for patients. That’s why it’s vital we get ahead of this problem."
Publication details
Machine Learning-Based Screening of Potential Paper Mill Publications in Cancer Research: Methodological and Cross-Sectional Study, BMJ (2026). DOI: 10.1136/bmj-2025-087581
Journal information: British Medical Journal (BMJ)
Key medical concepts
Clinical categories
Citation: Scientific ‘spam filter’ flags over 250,000 potentially fake cancer studies (2026, January 29) retrieved 29 January 2026 from https://medicalxpress.com/news/2026-01-scientific-spam-filter-flags-potentially.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.