Data availability
Normal and tumour-paired CRAM files, as well as raw methylation intensity (IDAT) files for the WGS participants from the Sherlock-Lung study, have been deposited in dbGaP under accession numbers phs001697.v2.p1 and phs002992.v1.p1. RNA-seq FASTQ files for the same individuals are available through dbGaP under the accession number phs002346.v1.p1. The human reference genome (GRCh38) was obtained from the GATK resource repository ([https://github.com/broadinstitute/gatk/blob/master/src/test/resour…
Data availability
Normal and tumour-paired CRAM files, as well as raw methylation intensity (IDAT) files for the WGS participants from the Sherlock-Lung study, have been deposited in dbGaP under accession numbers phs001697.v2.p1 and phs002992.v1.p1. RNA-seq FASTQ files for the same individuals are available through dbGaP under the accession number phs002346.v1.p1. The human reference genome (GRCh38) was obtained from the GATK resource repository (https://github.com/broadinstitute/gatk/blob/master/src/test/resources/large/Homo_sapiens_assembly38.fasta.gz). Publicly available LUAD multi-omics datasets can be accessed through dbGaP (accessions phs000178.v9.p8 and phs000488.v1.p1) and the European Genome-phenome Archive (EGA) (EGAS00001001757, EGAS00001002801 and EGAS00001003830). Detailed dataset descriptions and accession information are also provided in Supplementary Table 1.
Code availability
The WGS bioinformatics pipelines can be accessed at GitHub (https://github.com/xtmgah/Sherlock-Lung). Battenberg SCNA calling algorithm can be found at GitHub (https://github.com/Wedge-lab/battenberg). The Dirichlet process-based method for the subclonal reconstruction of tumours can be found at GitHub (https://github.com/Wedge-lab/dpclust). The bioinformatic pipeline for identifying TE insertion is available at GitLab (https://gitlab.com/mobilegenomesgroup/TraFiC).
References
Yates, L. R. & Campbell, P. J. Evolution of the cancer genome. Nat. Rev. Genet. 13, 795–806 (2012).
Article CAS PubMed Google Scholar 1.
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Article CAS PubMed Google Scholar 1.
Alexandrov, L. B. et al. Mutational signatures R. Science 354, 618–622 (2016).
Article CAS PubMed Google Scholar 1.
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Article CAS PubMed Google Scholar 1.
Yang, P., Wang, Y. & Macfarlan, T. S. The role of KRAB-ZFPs in transposable element repression and mammalian evolution. Trends Genet. 33, 871–881 (2017).
Article CAS PubMed Google Scholar 1.
Zhang, Y. et al. Global variations in lung cancer incidence by histological subtype in 2020: a population-based study. Lancet Oncol. 24, 1206–1218 (2023).
Article PubMed Google Scholar 1.
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Article CAS PubMed Google Scholar 1.
Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525–533 (2023).
Article CAS PubMed Google Scholar 1.
Burns, K. H. Transposable elements in cancer. Nat. Rev. Cancer 17, 415–424 (2017).
Article CAS PubMed Google Scholar 1.
Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 52, 306–319 (2020).
Article CAS PubMed Google Scholar 1.
Mendez-Dorantes, C. et al. Chromosomal rearrangements and instability caused by the LINE-1 retrotransposon. Preprint at bioRxiv https://doi.org/10.1101/2024.12.14.628481 (2024). 1.
Rodić, N. et al. Retrotransposon insertions in the clonal evolution of pancreatic ductal adenocarcinoma. Nat. Med. 21, 1060–1064 (2015).
Article PubMed Google Scholar 1.
Nguyen, T. H. M. et al. L1 retrotransposon heterogeneity in ovarian tumor cell evolution. Cell Rep. 23, 3730–3740 (2018).
Article CAS PubMed Google Scholar 1.
Zhang, T. et al. Genomic and evolutionary classification of lung cancer in never smokers. Nat. Genet. 53, 1348–1359 (2021).
Article CAS PubMed Google Scholar 1.
Landi, M. T. et al. Tracing lung cancer risk factors through mutational signatures in never-smokers. Am. J. Epidemiol. 190, 962–976 (2021).
Article PubMed Google Scholar 1.
Zhu, B. et al. The genomic and epigenomic evolutionary history of papillary renal cell carcinomas. Nat. Commun. 11, 3096 (2020).
Article CAS PubMed Google Scholar 1.
Senkin, S. et al. Geographic variation of mutagenic exposures in kidney cancer genomes. Nature 629, 910–918 (2024).
Article CAS PubMed Google Scholar 1.
Díaz-Gay, M. et al. The mutagenic forces shaping the genomes of lung cancer in never smokers. Nature 644, 133–144 (2025).
Article PubMed Google Scholar 1.
McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7, 283ra54 (2015).
Article PubMed Google Scholar 1.
Otlu, B. et al. Topography of mutational signatures in human cancer. Cell Rep. 42, 112930 (2023). 1.
Stamatoyannopoulos, J. A. et al. Human mutation rate associated with DNA replication timing. Nat. Genet. 41, 393–395 (2009).
Article CAS PubMed Google Scholar 1.
Whitfield, M. L., George, L. K., Grant, G. D. & Perou, C. M. Common markers of proliferation. Nat. Rev. Cancer 6, 99–106 (2006).
Article CAS PubMed Google Scholar 1.
Emami Nejad, A. et al. The role of hypoxia in the tumor microenvironment and development of cancer stem cell: a novel approach to developing treatment. Cancer Cell Int. 21, 62 (2021).
Article CAS PubMed Google Scholar 1.
Bhandari, V., Li, C. H., Bristow, R. G., Boutros, P. C. & PCAWG Consortium. Divergent mutational processes distinguish hypoxic and normoxic tumours. Nat. Commun. 11, 737 (2020). 1.
Liu, N. et al. Selective silencing of euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature 553, 228–232 (2018).
Article CAS PubMed Google Scholar 1.
Li, X. et al. LINE-1 transcription activates long-range gene expression. Nat. Genet. 56, 1494–1502 (2024).
Article CAS PubMed Google Scholar 1.
Scott, E. C. & Devine, S. E. The role of somatic L1 retrotransposition in human cancers. Viruses 9, 131 (2017). 1.
Tubio, J. M. C. et al. Mobile DNA in cancer. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 345, 1251343 (2014).
Article PubMed Google Scholar 1.
McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl Acad. Sci. USA 119, e2115999119 (2022). 1.
Kazazian, H. H. Jr & Moran, J. V. Mobile DNA in health and disease. N. Engl. J. Med. 377, 361–370 (2017).
Article CAS PubMed Google Scholar 1.
Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).
Article CAS PubMed Google Scholar 1.
Nam, C. H. et al. Widespread somatic L1 retrotransposition in normal colorectal epithelium. Nature 617, 540–547 (2023).
Article CAS PubMed Google Scholar 1.
Levin, H. L. & Moran, J. V. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 12, 615–627 (2011).
Article CAS PubMed Google Scholar 1.
Gasior, S. L., Wakeman, T. P., Xu, B. & Deininger, P. L. The human LINE-1 retrotransposon creates DNA double-strand breaks. J. Mol. Biol. 357, 1383–1393 (2006).
Article CAS PubMed Google Scholar 1.
Morrish, T. A. et al. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat. Genet. 31, 159–165 (2002).
Article CAS PubMed Google Scholar 1.
Farkash, E. A. & Luning Prak, E. T. DNA damage and L1 retrotransposition. J. Biomed. Biotechnol. 2006, 37285 (2006).
Suzuki, J. et al. Genetic evidence that the non-homologous end-joining repair pathway is involved in LINE retrotransposition. PLoS Genet. 5, e1000461 (2009).
Article PubMed Google Scholar 1.
Baldwin, E. T. et al. Structures, functions and adaptations of the human LINE-1 ORF2 protein. Nature 626, 194–206 (2024).
Article CAS PubMed Google Scholar 1.
Freeman, B. et al. Analysis of epigenetic features characteristic of L1 loci expressed in human cells. Nucleic Acids Res. 50, 1888–1907 (2022).
Article CAS PubMed Google Scholar 1.
Rodgers, K. & McVey, M. Error-prone repair of DNA double-strand breaks. J. Cell. Physiol. 231, 15–24 (2016).
Article CAS PubMed Google Scholar 1.
Wangsri, S., Subbalekha, K., Kitkumthorn, N. & Mutirangura, A. Patterns and possible roles of LINE-1 methylation changes in smoke-exposed epithelia. PLoS ONE 7, e45292 (2012).
Article CAS PubMed Google Scholar 1.
Stueve, T. R. et al. Epigenome-wide analysis of DNA methylation in lung tissue shows concordance with blood studies and identifies tobacco smoke-inducible enhancers. Hum. Mol. Genet. 26, 3014–3027 (2017).
Article CAS PubMed Google Scholar 1.
Caliri, A. W., Caceres, A., Tommasi, S. & Besaratinia, A. Hypomethylation of LINE-1 repeat elements and global loss of DNA hydroxymethylation in vapers and smokers. Epigenetics 15, 816–829 (2020).
Article PubMed Google Scholar 1.
Camila, B. et al. Genotoxicity and hypomethylation of LINE-1 induced by electronic cigarettes. Ecotoxicol. Environ. Saf. 256, 114900 (2023).
Article CAS PubMed Google Scholar 1.
Joehanes, R. et al. Epigenetic signatures of cigarette smoking. Circ. Cardiovasc. Genet. 9, 436–447 (2016).
Article CAS PubMed Google Scholar 1.
Imbeault, M., Helleboid, P.-Y. & Trono, D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543, 550–554 (2017).
Article CAS PubMed Google Scholar 1.
Han, G. et al. An atlas of epithelial cell states and plasticity in lung adenocarcinoma. Nature 627, 656–663 (2024).
Article CAS PubMed Google Scholar 1.
Long, E. et al. Context-aware single-cell multiomics approach identifies cell-type-specific lung cancer susceptibility genes. Nat. Commun. 15, 7995 (2024).
Article CAS PubMed Google Scholar 1.
Rosspopoff, O. & Trono, D. Take a walk on the KRAB side. Trends Genet. 39, 844–857 (2023).
Article CAS PubMed Google Scholar 1.
Hill, W. et al. Lung adenocarcinoma promotion by air pollutants. Nature 616, 159–167 (2023).
Article CAS PubMed Google Scholar 1.
Haga, Y. et al. Whole-genome sequencing reveals the molecular implications of the stepwise progression of lung adenocarcinoma. Nat. Commun. 14, 8375 (2023).
Article CAS PubMed Google Scholar 1.
Huang, Z. et al. Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking. Nat. Genet. 54, 492–498 (2022).
Article CAS PubMed Google Scholar 1.
Colom, B. et al. Mutant clones in normal epithelium outcompete and eliminate emerging tumours. Nature 598, 510–514 (2021).
Article CAS PubMed Google Scholar 1.
Jardim, D. L., Goodman, A., de Melo Gagliato, D. & Kurzrock, R. The challenges of tumor mutational burden as an immunotherapy biomarker. Cancer Cell 39, 154–173 (2021).
Article CAS PubMed Google Scholar 1.
Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16, 626–638 (2016).
Article CAS PubMed Google Scholar 1.
Vaz, M. et al. Chronic cigarette smoke-induced epigenomic changes precede sensitization of bronchial epithelial cells to single-step transformation by KRAS mutations. Cancer Cell 32, 360–376 (2017).
Article CAS PubMed Google Scholar 1.
Mengs, U. Tumour induction in mice following exposure to aristolochic acid. Arch. Toxicol. 61, 504–505 (1988).
Article CAS PubMed Google Scholar 1.
Ambatipudi, S. et al. Tobacco smoking-associated genome-wide DNA methylation changes in the EPIC study. Epigenomics 8, 599–618 (2016).
Article CAS PubMed Google Scholar 1.
Kobayashi, S. et al. EGFR mutation and resistance of non-small-cell lung cancer to gefitinib. N. Engl. J. Med. 352, 786–792 (2005).
Article CAS PubMed Google Scholar 1.
Shah, N. M. et al. Pan-cancer analysis identifies tumor-specific antigens derived from transposable elements. Nat. Genet. 55, 631–639 (2023).
Article CAS PubMed Google Scholar 1.
Bergmann, E. A., Chen, B.-J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics 32, 3196–3198 (2016).
Article CAS PubMed Google Scholar 1.
Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).
Article CAS PubMed Google Scholar 1.
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Article CAS PubMed Google Scholar 1.
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254 (2021).
Article CAS PubMed Google Scholar 1.
Sadedin, S. P. & Oshlack, A. Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data. Genome Biol. 20, 78 (2019).
Article PubMed Google Scholar 1.
Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).
Article PubMed Google Scholar 1.
Yuan, K., Macintyre, G., Liu, W., PCAWG-11 Working Group & Markowetz, F. Ccube: a fast and robust method for estimating cancer cell fractions. Preprint at bioRxiv https://doi.org/10.1101/484402 (2018). 1.
Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153, 919–929 (2013).
Article CAS PubMed Google Scholar 1.
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Article CAS PubMed Google Scholar 1.
Muiños, F., Martínez-Jiménez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).
Article PubMed Google Scholar 1.
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017). 1.
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).
Article CAS PubMed Google Scholar 1.
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).
Article CAS PubMed Google Scholar 1.
Bergstrom, E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genom. 20, 685 (2019).
Islam, S. M. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom. 2, 100179 (2022). 1.
Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).
Article CAS PubMed Google Scholar 1.
Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023). 1.
Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, abl9283 (2022). 1.
Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung ade