-
Data Descriptor
-
Published: 18 December 2025
Scientific Data , Article number: (2025) Cite this article
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.
Abstract
Monitoring and maintaining railway corridors requires accurate, high-resolu…
-
Data Descriptor
-
Published: 18 December 2025
Scientific Data , Article number: (2025) Cite this article
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.
Abstract
Monitoring and maintaining railway corridors requires accurate, high-resolution spatial data to ensure operational safety and efficiency. However, data-driven railway infrastructure assessment has been limited by the scarcity of large-scale, finely annotated 3D benchmarks. To address this gap we first introduce SemanticRail3D, a mobile-LiDAR dataset of 438 high-resolution point clouds (approximately 2.8 billion points) in 200 m segments, annotated via a heuristic rule-based segmentation method into 12 semantic classes and grouped into instance labels, with per-point intensity information. Building on this foundation, we present SemanticRail3D-V2, featuring a Machine Learning (ML)-ready preprocessing pipeline that aligns and sections raw scans into uniform blocks, and a novel evaluation protocol combining metric-based anomaly detection with a probabilistic validity analysis of class distributions and spatial relationships. Railway domain experts then reviewed each block to remove low-quality scans and divide the dataset into five training shards-selected for annotation accuracy and complexity-plus separate validation and held-out test sets. The SemanticRail3D dataset, together with the V2 enhancements, offers a rigorously curated, richly annotated benchmark for semantic and instance segmentation in railway environments, supporting research in condition monitoring and asset management.
Data availability
The SemanticRail3D and SemanticRail3D-V2 datasets are publicly available on Zenodo at https://zenodo.org/records/11143767and https://zenodo.org/records/15641832, respectively. Both datasets can be freely accessed under a Creative Commons Attribution (CC BY 4.0) license, as described in the Data Records section.
References
Eurostat. EU high-speed rail lines grew to 8,556 km in 2023, https://ec.europa.eu/eurostat/web/products-eurostat-news/w/ddn-20250206-1. Accessed 5 May 2025 (2025). 1.
Eurostat. Railway passenger transport statistics - quarterly and annual data, https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Railway_passenger_transport_statistics_-_quarterly_and_annual_data. Accessed 5 May 2025 (2024a). 1.
Eurostat. Rail freight transport statistics (2022), https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Railway_freight_transport_statistics. Accessed 5 May 2025 (2023). 1.
Eurostat. Railway safety statistics in the eu, https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Railway_safety_statistics_in_the_EU. Data from December 2024; accessed 23 June 2025 (2024b). 1.
Community of European Railway and Infrastructure Companies (CER). Report from the eim-efrtc-cer working group on market strategies for track maintenance & renewal, https://cer.be/cer-reports/report-from-the-eim-efrtc-cer-working-group-on-market-strategies-for-track-maintenance-renewal. Accessed: May 5, 2025 (2024). 1.
International Energy Agency. Rail, https://www.iea.org/energy-system/transport/rail. Accessed 5 May 2025 (2023). 1.
Oliver Wyman. Rail can do even more to decarbonize the eu transport sector, https://www.oliverwyman.com/our-expertise/insights/2024/nov/rail-can-do-even-more-to-decarbonize-the-eu-transport-sector.html. Accessed 5 May 2025 (2024). 1.
European Union Agency for Railways. Rail environmental report. Technical report, European Union Agency for Railways, Luxembourg, https://op.europa.eu/en/publication-detail/-/publication/9e248500-20b1-11ef-af46-01aa75ed71a1. Accessed May 5, 2025 (2024). 1.
Office of Rail Regulation. Review of asset management best practice: Summary report. Technical Report BBRT-2556-RP-0003, Office of Rail Regulation, March Issue 3b (2013). 1.
Europe’s Rail Joint Undertaking. Innovation in the spotlight: One step ahead with predictive maintenance, May https://rail-research.europa.eu/latest-news/innovation-in-the-spotlight-one-step-ahead-with-predictive-maintenance/. Accessed: 2025-05-06 (2019). 1.
Qi, C. R., Su, H., Mo, K. & Guibas, L. J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 77–85 (2017). 1.
Thomas, H. et al. Kpconv: Flexible and deformable convolution for point clouds. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 6410–6419 (2019). 1.
Landrieu, L. & Simonovsky, M. Large-scale point cloud semantic segmentation with superpoint graphs. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 4558–4567 (2018). 1.
Zhao, H., Jiang, L., Jia, J., Torr, P. H. & Koltun, V. Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, 16259–16268 (2021). 1.
Wu, X. et al. Point transformer v3: Simpler faster stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4840–4851 (2024). 1.
Yang, Y.-Q. et al. Swin3d: A pretrained transformer backbone for 3d indoor scene understanding. Computational Visual Media 11(1), 83–101 (2025).
Ton, B., Ahmed, F. & Linssen, J. Labelled high resolution point cloud dataset of 15 catenary arches in the netherlands. Technical report, 4TU Res. data, The Netherlands Tech. Rep. (2022a). 1.
Ghasemlou, A., Soilán, M. & Riveiro, B. Semantic segmentation of imbalanced 3d point clouds in railway environments: comparative analysis of algorithms and training pipelines for semantic segmentation (2025a). 1.
Dekker, B., Soilán, M., Sánchez-Rodríguez, A., Pérez-Collazo, C. & Arias, P. Point cloud analysis of railway infrastructure: A systematic literature review. IEEE Access 11, 134357–134372 (2023).
Wang, Z., Yu, G., Chen, P., Zhou, B. & Yang, S. Farnet: An attention-aggregation network for long-range rail track point cloud segmentation. IEEE Trans. Intell. Transp. Syst. 23(8), 13118–13126 (2022).
Pastucha, E. Catenary system detection, localization and classification using mobile scanning data. Remote Sens. 8(10), 801 (2016).
Yang, B. & Fang, L. Automated extraction of 3-d railway tracks from mobile laser scanning point clouds. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 7(12), 4750–4761 (2014).
Zhu, L. & Hyyppa, J. The use of airborne and mobile laser scanning for modeling railway environments in 3d. Remote Sens. 6(4), 3075–3100 (2014).
Corongiu, M., Tucci, G., Santoro, E. & Kourounioti, O. Data integration of different domains in geo-information management: A railway infrastructure case study. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 4, 121–127 (2018).
Ton, B., Ahmed, F. & Linssen, J. Semantic segmentation of terrestrial laser scans of railway catenary arches: A use case perspective. Sensors 23(1), 222, https://doi.org/10.3390/s23010222 (2022).
Qiu, B., Zhang, L., Li, W. & Chen, J. WHU-Railway3D: A large-scale benchmark for railway point cloud semantic segmentation. IEEE Trans. Intelligent Transportation Systems 25(3), 1234–1247, https://doi.org/10.1109/TITS.2024.3469546 (2024).
Jiang, X., Wang, M., Zhou, L. & Sun, Y. RailPC: A large-scale mobile lidar point cloud benchmark for railway semantic segmentation. CAAI Transactions on Intelligence Technology 9(2), 567–580, https://doi.org/10.1049/cit2.12349 (2024).
Kovács, T., Dubois, P. & Schneider, M. Rail3D: A multi-context european railway point cloud dataset. Infrastructures 9(1), 45, https://doi.org/10.3390/infrastructures9010045 (2024).
Schütze, J., Meyer, F. & Becker, L. OSDaR23: Open sensor data for rail-multi-modal sequences for autonomous train research. Technical report, FID Move, https://fid-move.org/osdar23 (2023). 1.
Martin, P., Lefèvre, A. & Dupont, C. RailCloud-HdF: A large-scale lidar dataset for railway corridor monitoring. In Proc. VISAPP, 300–312 (2024). 1.
Harb, J. et al. Frsign: A large-scale traffic light dataset for autonomous trains. arXiv preprint arXiv:2002.05665 (2020). 1.
Eastepp, M., Faris, L. & Ricks, K. Ua_l-dott: University of alabama’s large dataset of trains and trucks. Data in Brief 42, 108073 (2022).
Sturari, M., Paolanti, M., Frontoni, E., Mancini, A. & Zingaretti, P. Robotic platform for deep change detection for rail safety and security. In European Conference on Mobile Robots (ECMR) 1–6 (2017). 1.
Song, H., Zhang, W., Tian, Y., Liu, X. & Ma, J. A training dataset for semantic segmentation of high-speed railway point clouds at 3000 pts/m2 density. ISPRS Journal of Photogrammetry and Remote Sensing 187, 159–170 (2022).
Lu, Z. et al. Bolt 3d point cloud segmentation and measurement based on dbscan clustering. In China Automation Congress (CAC) 420–425 (2021). 1.
Lamas, D., Soilán, M., Grandío, J. & Riveiro, B. Automatic point cloud semantic segmentation of complex railway environments. Remote Sensing 13(12), 2332, https://doi.org/10.3390/rs13122332 (2021).
Puente, I., González-Jorge, H., Riveiro, B. & Arias, P. Accuracy verification of the lynx mobile mapper system. Optics & Laser Technology 45, 578–586 (2013).
Gressin, A., Mallet, C. Demantké, J. & David, N. Towards 3d lidar point cloud registration improvement using optimal neighborhood knowledge. ISPRS Journal of Photogrammetry and Remote Sensing 79, 240–251 (2013).
Ghasemlou, A. et al. Semanticrail3d: A 3d point cloud dataset with semantic annotations of railway environments, https://doi.org/10.5281/zenodo.15641832 (2025b) 1.
Soilán, M. et al. Semanticrail3d: A 3d point cloud dataset with semantic annotations of railway environments. SemanticRail3D: A 3D Point Cloud dataset with semantic annotations of railway environments, https://doi.org/10.5281/zenodo.11143767 (2024). 1.
Ministerio de Transportes, Movilidad y Agenda Urbana. Anejo n∘ 13. electrificación. estudio informativo red arterial ferroviaria de elche: Variante de conexión de la nueva estación de alta velocidad con el centro urbano - fase ii. Estudio informativo ANEJO 13, Ministerio de Transportes, Movilidad y Agenda Urbana (España), May https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a-13_electrificacion.pdf (2025). 1.
Ministerio de Transportes, Movilidad y Agenda Urbana. Orden tma/135/2023, de 15 de febrero, por la que se aprueban la instrucción ferroviaria para el proyecto y construcción del subsistema de infraestructura (ifi) y del subsistema de energía (ife) y se modifican diversas órdenes, February AnBoletAnnOficial del EstadoAz. Permalink ELI (2023).
Acknowledgements
This work has been supported by the Spanish Ministry of Science, Innovation and Universities through by research project CPP2021-008374 funded by MICIU/AEI/10.13039/501100011033 and by European Union NextGenerationEU/PRTR; by grant RYC2021-033560-I funded by MCIN/AEI/10.13039/501100011033 and by European Union NextGenerationEU/PRTR and by grant ED431F 2024/02 funded by Xunta de Galicia, Spain-GAIN. Additionally, some computational resources were provided by the Centro de Supercomputación de Galicia (CESGA).
Author information
Authors and Affiliations
CINTECX, Universidade de Vigo, GeoTECH Group, Campus Universitario de Vigo, As Lagoas, Marcosende, 36310, Vigo, Spain
Arshia Ghasemlou, Mario Soilán & Belén Riveiro
Authors
- Arshia Ghasemlou
- Mario Soilán
- Belén Riveiro
Contributions
B.R. contributed to conceptualization, supervision, funding acquisition, and writing-review and editing. M.S. was responsible for conceptualization, data curation, supervision, funding acquisition, and writing-review and editing. A.G. was involved in conceptualization, methodology, software, data curation, and writing the original draft. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Belén Riveiro.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ghasemlou, A., Soilán, M. & Riveiro, B. SemanticRail3D - A Mobile LiDAR Benchmark for Semantic and Instance Segmentation of Railway Corridors. Sci Data (2025). https://doi.org/10.1038/s41597-025-06392-9
Received: 17 July 2025
Accepted: 27 November 2025
Published: 18 December 2025
DOI: https://doi.org/10.1038/s41597-025-06392-9