biorxiv.org

Quantifying data reuse in proteomics using PRIDE downloads statistics and a semi-supervised LLM-based framework (opens in new tab)

Understanding how scientific datasets are accessed and reused is essential for resource planning and impact assessment. Here we present the PRIDE Archive download tracking infrastructure and a comprehensive analysis of 159.3 million download records from the PRIDE proteomics database (2021-2025), spanning 35,528 datasets accessed from 235 locations. The infrastructure includes nf-downloadstats, a scalable Nextflow pipeline for processing download logs, and DeepLogBot, a machine-learning frame...

Read the original article
Sign in to keep reading the full article.

Keyboard Shortcuts

Navigation

Next / previous post
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Discover
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help