CistromeMeta: A Large Language Model Powered Tool for Automated ChIP-seq Metadata Extraction (opens in new tab)
AbstractSummaryPublic repositories such as NCBI’s Gene Expression Omnibus (GEO) contain large numbers of ChIP-seq experiments, but their reuse is limited by heterogeneous free-text metadata describing target proteins, histone marks, cell lines, tissues, and disease states. We introduce CistromeMeta, a Python-based command-line tool that leverages large language models (LLMs) in a few-shot setting to automatically extract and standardize ChIP-seq metadata from GEO XML records without custom mo...
Read the original article