Abstract
The escalating challenges of climate change, extreme weather events, and increasing food demand impose a significant strain on global food production. To develop and apply sustainable agriculture practices, farmers and organizations require detailed, timely information about weather, crops, and yields. While efficient agricultural monitoring relies heavily on remote sensing, the existing literature suffers from a notable lack of comprehensive, large-scale crop monitoring datasets. This paper introduces CropClimateX, a novel database built by optimizing location sampling to substantially cover cultivated areas throughout the contiguous United States. The database comprises 15,500 small 12 × 12 km data cubes spanning 1,527 counties. Crucially, each data cube integrates a ri…
Abstract
The escalating challenges of climate change, extreme weather events, and increasing food demand impose a significant strain on global food production. To develop and apply sustainable agriculture practices, farmers and organizations require detailed, timely information about weather, crops, and yields. While efficient agricultural monitoring relies heavily on remote sensing, the existing literature suffers from a notable lack of comprehensive, large-scale crop monitoring datasets. This paper introduces CropClimateX, a novel database built by optimizing location sampling to substantially cover cultivated areas throughout the contiguous United States. The database comprises 15,500 small 12 × 12 km data cubes spanning 1,527 counties. Crucially, each data cube integrates a rich array of multi-source information, including multi-sensor imagery (Sentinel-1/2, Landsat-8, MODIS), weather and extreme events (Daymet, heat/cold waves, and drought monitor maps), and environmental features (soil and terrain characteristics). This comprehensive, integrated dataset is designed to support a wide range of agricultural monitoring tasks, providing a vital resource for advancing research in sustainable farming and crop modeling.
Data availability
The dataset is available for unlimited use under the Creative Commons License 4.0 International at Huggingface https://doi.org/10.57967/hf/5047. The original data sources are cited within this paper in the Methods section.
Code availability
The code implementation to reproduce the database and is available on GitHub (https://github.com/zhu-xlab/CropClimateX). The repository includes a comprehensive README file and a tutorial to guide users through setup, usage, and potential modifications. Additionally, it includes a script to efficiently download the data from HuggingFace.
References
Intergovernmental Panel on Climate Change (IPCC). Weather and Climate Extreme Events in a Changing Climate, 1513-1766 (Cambridge University Press, 2023). 1.
Boyer, J. S. et al. The U.S. drought of 2012 in perspective: A call to action. Global Food Security 2, 139–143, https://doi.org/10.1016/j.gfs.2013.08.002 (2013).
United Nations. The un sustainable development goals Accessed: 2025-03-21 (2015). 1.
Bongiovanni, R. & Lowenberg-DeBoer, J. Precision agriculture and sustainability. Precision agriculture 5, 359–387 (2004).
Mulla, D. J. Twenty five years of remote sensing in precision agriculture: Key advances and remaining knowledge gaps. Biosystems engineering 114, 358–371 (2013).
Weiss, M., Jacob, F. & Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sensing of Environment 236, 111402, https://doi.org/10.1016/j.rse.2019.111402 (2020).
Atzberger, C. Advances in remote sensing of agriculture: Context description, existing operational monitoring systems and major information needs. Remote Sensing 5, 949–981, https://doi.org/10.3390/rs5020949 (2013).
Dewangan, U., Talwekar, R. H. & Bera, S. Systematic Literature Review on Crop Yield Prediction using Machine & Deep Learning Algorithm. In 2022 5th International Conference on Advances in Science and Technology (ICAST), 654–661 https://doi.org/10.1109/ICAST55766.2022.10039620 (2022). 1.
Muruganantham, P., Wibowo, S., Grandhi, S., Samrat, N. H. & Islam, N. A Systematic Literature Review on Crop Yield Prediction with Deep Learning and Remote Sensing. Remote Sensing 14, 1990, https://doi.org/10.3390/rs14091990 (2022).
Zhu, X. X. et al. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE geoscience and remote sensing magazine 5, 8–36 (2017).
Yun, S. D. & Gramig, B. M. Agro-Climatic Data by County: A Spatially and Temporally Consistent U.S. Dataset for Agricultural Yields, Weather and Soils. Data 4, 66, https://doi.org/10.3390/data4020066 (2019).
Su, Y., Gabrielle, B. & Makowski, D. A global dataset for crop production under conventional tillage and no tillage systems. Scientific Data 8, 33, https://doi.org/10.1038/s41597-021-00817-x (2021).
Yeh, C. et al. SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning https://doi.org/10.48550/arXiv.2111.04724 (2021). 1.
Iizumi, T. & Sakai, T. The global dataset of historical yields for major crops 1981-2016. Scientific Data 7, 97, https://doi.org/10.1038/s41597-020-0433-7 (2020).
Luo, Y. et al. Globalwheatyield4km: a global wheat yield dataset at 4-km resolution during 1982–2020 based on deep learning approaches. Earth System Science Data Discussions 2022, 1–18, https://doi.org/10.5194/essd-2022-297 (2022).
Chiu, M. T. et al. Agriculture-vision: A large aerial image database for agricultural pattern analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). 1.
Sani, D. et al. Sickle: A multi-sensor satellite imagery dataset annotated with multiple key cropping parameters. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 5995–6004 (2024). 1.
Lin, F. et al. An open and large-scale dataset for multi-modal climate change-aware crop yield predictions. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 5375–5386 (2024). 1.
Beguería, S., Vicente-Serrano, S. M. & Angulo-Martínez, M. A Multiscalar Global Drought Dataset: The SPEIbase: A New Gridded Product for the Analysis of Drought Variability and Impacts. Bulletin of the American Meteorological Society 91, 1351–1356, https://doi.org/10.1175/2010BAMS2988.1 (2010).
Spinoni, J. et al. A new global database of meteorological drought events from 1951 to 2016. Journal of Hydrology: Regional Studies 22, 100593, https://doi.org/10.1016/j.ejrh.2019.100593 (2019).
Vicente-Serrano, S. M. et al. A global drought monitoring system and dataset based on ERA5 reanalysis: A focus on crop-growing regions. Geoscience Data Journal 10, 505–518, https://doi.org/10.1002/gdj3.178 (2023).
Wang, Q. et al. A multi-scale daily SPEI dataset for drought characterization at observation stations over mainland China from 1961 to 2018. Earth System Science Data 13, 331–341, https://doi.org/10.5194/essd-13-331-2021 (2021).
Pyarali, K., Peng, J., Disse, M. & Tuo, Y. Development and application of high resolution SPEI drought dataset for Central Asia. Scientific Data 9, 172, https://doi.org/10.1038/s41597-022-01279-5 (2022).
Peng, J. et al. A pan-African high-resolution drought index dataset. Earth System Science Data 12, 753–769, https://doi.org/10.5194/essd-12-753-2020 (2020).
Prabhat et al. Climatenet: an expert-labeled open dataset and deep learning architecture for enabling high-precision analyses of extreme weather. Geoscientific Model Development 14, 107–124, https://doi.org/10.5194/gmd-14-107-2021 (2021).
Racah, E. et al. ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. In Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., 2017). 1.
Requena-Mesa, C., Benson, V., Reichstein, M., Runge, J. & Denzler, J. EarthNet2021: A large-scale dataset and challenge for Earth surface forecasting as a guided video prediction task. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 1132–1142, https://doi.org/10.48550/arxiv.2104.10066 (2021). 1.
Benson, V. et al. Multi-modal learning for geospatial vegetation forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 27788–27799 (2024). 1.
Ji, C. et al. DeepExtremeCubes: Earth system spatio-temporal data for assessing compound heatwave and drought impacts. Scientific Data 12, 149, https://doi.org/10.1038/s41597-025-04447-5 (2025).
Nedungadi, V. et al. From General to Specialized: The Need for Foundational Models in Agriculture, https://doi.org/10.48550/arXiv.2507.05390 (2025). 1.
Karmakar, P. et al. Crop monitoring by multimodal remote sensing: A review. Remote Sensing Applications: Society and Environment 33, 101093, https://doi.org/10.1016/j.rsase.2023.101093 (2024).
Koppen, W. Das geographische system de klimate. Handbuch der klimatologie (1936). 1.
USDA NASS. 2022 Census of Agriculture. Complete data available at www.nass.usda.gov/AgCensus (2022). 1.
Rippey, B. R. The u.s. drought of 2012. Weather and Climate Extremes 10, 57–64, https://doi.org/10.1016/j.wace.2015.10.004 (2015).
USGCRP. Impacts, risks, and adaptation in the united states: Fourth national climate assessment, volume ii: Report-in-brief. Tech. Rep., https://doi.org/10.7930/NCA4.2018.RiB. Reidmiller, D.R., Avery, C.W., Easterling, D.R., Kunkel, K.E., Lewis, K.L.M., Maycock, T.K., and Stewart, B.C. (eds.) (2018). 1.
Microsoft Open Source, McFarland, M., Emanuele, R., Morris, D. & Augspurger, T. microsoft/planetarycomputer: October 2022 https://doi.org/10.5281/zenodo.7261897 (2022). 1.
Gorelick, N. et al. Google earth engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment 202, 18–27, https://doi.org/10.1016/j.rse.2017.06.031 (2017).
Ecosystem, C. D. S. Copernicus Data Space Ecosystem ∣ Europe’s eyes on Earth. https://dataspace.copernicus.eu/ (2024). 1.
Höhl, A., Höhn, P. & Zhu, X. X. Terragon: A Unified Framework for Earth Observation Data Cube Generation. Journal of Open Source Software 10, 8857, https://doi.org/10.21105/joss.08857 (2025).
United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS). Quick stats database Accessed 2024-01-01 (2024). 1.
United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS). Cropland data layer. USDA NASS, USDA NASS Marketing and Information Services Office, Washington, D.C. Accessed via Google Earth Engine on 2024-01-01 (2024). 1.
United States - Crop Calendar [Online; accessed 31. Jul. 2025] (2025). 1.
Dewitz, J. National land cover database (nlcd) 2021 products. U.S. Geological Survey data release https://doi.org/10.5066/P9JZ7AO3 (2023). 1.
Huo, L., Persson, H. J. & Lindberg, E. Early detection of forest stress from European spruce bark beetle attack, and a new vegetation index: Normalized distance red & SWIR (NDRS). Remote Sensing of Environment 255, 112240, https://doi.org/10.1016/j.rse.2020.112240 (2021).
Saini, R. & Ghosh, S. K. Exploring capabilities of sentinel-2 for vegetation mapping using random forest. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-3, 1499–1502, https://doi.org/10.5194/isprs-archives-XLII-3-1499-2018 (2018).
Yi, Z., Jia, L. & Chen, Q. Crop classification using multi-temporal sentinel-2 data in the shiyang river basin of china. Remote Sensing 12, 4052, https://doi.org/10.3390/rs12244052 (2020).
Sothe, C., de Almeida, C. M., Liesenberg, V. & Schimalski, M. B. Evaluating sentinel-2 and landsat-8 data to map sucessional forest stages in a subtropical forest in southern brazil. Remote Sensing 9, 838, https://doi.org/10.3390/rs9080838 (2017).
Imran, A. et al. Narrow band based and broadband derived vegetation indices using Sentinel-2 Imagery to estimate vegetation biomass. Global Journal of Environmental Science and Management 6, 97–108, https://doi.org/10.22034/GJESM.2020.01.08 (2020).
Liu, Y., Qian, J. & Yue, H. Comprehensive evaluation of sentinel-2 red edge and shortwave-infrared bands to estimate soil moisture. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 7448–7465, https://doi.org/10.1109/JSTARS.2021.3098513 (2021).
Yu, Z. et al. Selection of landsat 8 OLI band combinations for land use and land cover classification. In 2019 8th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), 1–5 https://doi.org/10.1109/Agro-Geoinformatics.2019.8820595 (2019). 1.
Copernicus Sentinel. Contains modified copernicus sentinel data [2018-2022]/copernicus data space ecosystem Accessed via Copernicus Data Space Ecosystem services (2024). 1.
U.S. Geological Survey. Contains modified Landsat-8 L2 data [2018-2022] courtesy of the U.S. Geological Survey Processed by Sentinel Hub (2025). 1.
Vermote, E. Modis/terra surface reflectance 8-day l3 global 500m sin grid v061 [data set]. NASA EOSDIS Land Processes Distributed Active Archive Center (2021). 1.
Myneni, R., Knyazikhin, Y. & Park, T. Modis/terra leaf area index/fpar 8-day l4 global 500m sin grid v061 [data set]. NASA EOSDIS Land Processes Distributed Active Archive Center (2021). 1.
Wan, Z., Hook, S. & Hulley, G. Modis/terra land surface temperature/emissivity 8-day l3 global 1km sin grid v061 [data set]. NASA EOSDIS Land Processes Distributed Active Archive Center (2021). 1.
Copernicus Sentinel. Contains modified copernicus sentinel data [2018-2022]/planetary computer Accessed via Planetary Computer services (2025). 1.
Montero, D. et al. A standardized catalogue of spectral indices to advance the use of remote sensing in Earth system research. Scientific Data 10, 197, https://doi.org/10.1038/s41597-023-02096-0 (2023).
Zeng, Y. et al. Optical vegetation indices for monitoring terrestrial ecosystems globally. Nature Reviews Earth & Environment 3, 477–493, https://doi.org/10.1038/s43017-022-00298-5 (2022).
Thornton, M., Shrestha, R., Wei, Y., Thornton, P. & Kao, S.-C. Daymet: Daily surface weather data on a 1-km grid for north america, version 4 r1 https://doi.org/10.3334/ORNLDAAC/2129 (2022). 1.
Lavaysse, C. et al. Towards a monitoring system of temperature extremes in Europe. Natural Hazards and Earth System Sciences 18, 91–104, https://doi.org/10.5194/nhess-18-91-2018 (2018).
National Drought Mitigation Center and U.S. Department of Agriculture and National Oceanic and Atmospheric Administration. United States Drought Monitor. https://droughtmonitor.unl.edu/ Accessed: 2023-01-01 (2023). 1.
Poggio, L. et al. Soilgrids 2.0: producing soil information for the globe with quantified spatial uncertainty. Soil 7, 217–240 (2021).
Tale, K. S. & Ingole, S. A review on role of physico-chemical properties in soil quality. Chemical Science Review and Letters 4, 57–66 (2015).
Weil, R. R., Brady, N. C. & Weil, R. R.The nature and properties of soils, vol. 1104 (Pearson London, UK, 2017). 1.
van Klompenburg, T., Kassahun, A. & Catal, C. Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture 177, 105709, https://doi.org/10.1016/j.compag.2020.105709 (2020).
Khaki, S., Wang, L. & Archontoulis, S. V. A cnn-rnn framework for crop yield prediction. Frontiers in Plant Science 10, https://doi.org/10.3389/fpls.2019.01750 (2020). 1.
Srivastava, A. K. et al. Winter wheat yield prediction using convolutional neural networks from environmental and phenological data. Scientific reports 12, 3215 (2022).
Fan, J., McConkey, B., Wang, H. & Janzen, H. Root distribution by depth for temperate agricultural crops. Field Crops Research 189, 68–74, https://doi.org/10.1016/j.fcr.2016.02.013 (2016).
Müllers, Y. et al. Shallow roots of different crops have greater water uptake rates per unit length than deep roots in well-watered soil. Plant and Soil 481, 475–493 (2022).
U.S. Geological Survey. 3d elevation program 10-meter resolution digital elevation model Published 20220830. Accessed October 01, 2024. (2022). 1.
Mahecha, M. D. et al. Earth system data cubes unravel global multivariate dynamics. Earth System Dynamics 11, 201–234, https://doi.org/10.5194/esd-11-201-2020 (2020).
Garey, M. R. & Johnson, D. S.Computers and Intractability: A Guide to the Theory of NP-completeness. A Series of Books in the Mathematical Sciences, 27. print edn (Freeman, New York [u.a], 2009). 1.
Wei, R. & Murray, A. T. Evaluating Polygon Overlay to Support Spatial Optimization Coverage Modeling. Geographical Analysis 46, 209–229, https://doi.org/10.1111/gean.12036 (2014).
Murray, A. T., O’Kelly, M. E. & Church, R. L. Regional service coverage modeling. Computers & Operations Research 35, 339–355, https://doi.org/10.1016/j.cor.2006.03.004 (2008).
Yuan, J. et al. Global Optimization of UAV Area Coverage Path Planning Based on Good Point Set and Genetic Algorithm. Aerospace 9, 86, https://doi.org/10.3390/aerospace9020086 (2022).
Mansouri, S. S., Georgoulas, G., Gustafsson, T. & Nikolakopoulos, G. On the covering of a polygonal region with fixed size rectangles with an application towards aerial inspection. In 2017 25th Mediterranean Conference on Control and Automation (MED), 1219–1224, https://doi.org/10.1109/MED.2017.7984284 (2017). 1.
Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Parizeau, M. & Gagné, C. DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research 13, 2171–2175 (2012).
Höhl, A., Ofori-Ampofo, S., Fernández-Torres, M.-Á., Kuzu, R. S. & Zhu, X. X. CropClimateX: A large-scale, multitask, multisensory dataset for crop monitoring under climate extremes. Hugging Face https://doi.org/10.57967/hf/5047 (2025). 1.
Miles, A. et al. zarr-developers/zarr-python: v2.4.0 https://doi.org/10.5281/zenodo.3773450 (2020). 1.
Hoyer, S. & Hamman, J. xarray: N-D labeled arrays and datasets in Python. Journal of Open Research Software 5, 10–10, https://doi.org/10.5334/jors.148 (2017).
Beck, H. E. et al. High-resolution (1 km) Köppen-Geiger maps for 1901–2099 based on constrained CMIP6 projections. Scientific Data 10, 724, https://doi.org/10.1038/s41597-023-02549-6 (2023).
Tomer, M. D., James, D. E. & Sandoval-Green, C. M. J. Agricultural Conservation Planning Framework: 3. Land Use and Field Boundary Database Development and Structure. Journal of Environmental Quality 46, 676–686, https://doi.org/10.2134/jeq2016.09.0363 (2017).
Duden, C., Nacke, C. & Offermann, F. German yield and area data for 11 crops from 1979 to 2021 at a harmonized spatial resolution of 397 districts. Scientific Data 11, 95, https://doi.org/10.1038/s41597-024-02951-8 (2024).
Paudel, D. et al. Cy-bench: A comprehensive benchmark dataset for sub-national crop yield forecasting. Earth System Science Data Discussions 2025, 1–28, https://doi.org/10.5194/essd-2025-83 (2025).
Minixhofer, C. D., Swan, M., McMeekin, C. & Andreadis, P. Droughted: A dataset and methodology for drought forecasting spanning multiple climate zones. In ICML 2021 Workshop on Tackling Climate Change with Machine Learning (2021).
Acknowledgements
The work of A. Höhl was funded by the project ML4Earth by the German Federal Ministry for Economic Affairs and Energy under grant number 50EE2201C. The work of S. Ofori-Ampofo is supported through the iMONITOR project funded by Industrieanlagen-Betriebsgesellschaft (IABG) and administered through Munich Aerospace .ev Scholarship. M.Á. Fernández-Torres acknowledges the support of the project ThinkingEarth, funded under Grant Agreement number 101130544 by the Horizon Europe program topic HORIZON-EUSPA-2022-SPACE-02-55, which promotes the large-scale Copernicus data uptake with AI and HPC, as well as the support of the Regional Government of Madrid through project TEC-2024/COM-322. The work of R. S. Kuzu is supported through the HYPER-AMPLIFAI project funded by the Helmholtz Association of German Research Centres (HGF) with contract number ZT-I-PF-4-056. The work of X.Zhu is also supported by the Munich Center for Machine Learning. Data retrieval from SentinelHub API is supported by the ESA Network of Resources Initiative. During the preparation of this work, the authors utilized AI, specifically Grammarly, to enhance the manuscript’s readability. The authors are responsible for the content of this publication.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Data Science in Earth Observation, Technical University of Munich (TUM), 80333, Munich, Germany
Adrian Höhl, Stella Ofori-Ampofo & Xiao Xiang Zhu 1.
Munich Center for Machine Learning, 80333, Munich, Germany
Adrian Höhl & Xiao Xiang Zhu 1.
Department of Signal Theory and Communications, Universidad Carlos III de Madrid (UC3M), 28911, Madrid, Spain
Miguel-Ángel Fernández-Torres 1.
Remote Sensing Institute, German Aerospace Center (DLR), 82234, Wessling, Germany
Rıdvan Salih Kuzu
Authors
- Adrian Höhl
- Stella Ofori-Ampofo
- Miguel-Ángel Fernández-Torres
- Rıdvan Salih Kuzu
- Xiao Xiang Zhu
Contributions
A.H.: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing- Original draft, Visualization. S.O.-A.: Conceptualization, Software, Validation, Formal analysis, Data Curation, Writing- Original draft, Visualization. M.-Á.F.-T.: Supervision, Conceptualization, Methodology, Writing- Reviewing and Editing. R.S.K.: Supervision, Writing- Reviewing and Editing. X.X.Z: Supervision, Conceptualization, Writing- Reviewing and Editing, Funding acquisition, Project administration.
Corresponding authors
Correspondence to Adrian Höhl or Xiao Xiang Zhu.
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Höhl, A., Ofori-Ampofo, S., Fernández-Torres, MÁ. et al. A large-scale, multitask, multisensory dataset for climate-aware crop monitoring in the US from 2018–2022. Sci Data (2026). https://doi.org/10.1038/s41597-026-06611-x
Received: 29 August 2025
Accepted: 09 January 2026
Published: 20 January 2026
DOI: https://doi.org/10.1038/s41597-026-06611-x