Background & Summary
The expansion of human activities has exerted pressure on almost all environments globally, and numerous attempts have been made to quantify this impact. One of the first global measures of total human impact on the worldwide environment, the Human Footprint Index (HFI), was released in 2002. The authors concluded that 83 percent of the global land surface was influenced by human activities, with few natural regions remaining1. This analysis…
Background & Summary
The expansion of human activities has exerted pressure on almost all environments globally, and numerous attempts have been made to quantify this impact. One of the first global measures of total human impact on the worldwide environment, the Human Footprint Index (HFI), was released in 2002. The authors concluded that 83 percent of the global land surface was influenced by human activities, with few natural regions remaining1. This analysis was conducted at 1 km2, which has remained the mapping standard for human impact on the environment (HIE). These earlier studies established that Brazil holds the most significant proportion of low-impacted lands, mainly within the Amazon Basin, but with rapidly expanding human impacts2.
Brazil is the fifth largest country on earth and is widely recognized as one of the most biodiverse countries globally, hosting an estimated 15 percent of all known living species3. This level of biodiversity is attributed to its vast and varied ecosystems, including at least six major terrestrial biomes—the Amazon Rainforest, Atlantic Forest, Cerrado, Caatinga, Pantanal, and Pampa—as well as an extensive coastline with globally significant dune and mangrove habitats. The country’s elevated levels of endemism further underscore its unique ecological significance. For example, Brazil holds the top position among the world’s 17 megadiverse countries, which collectively harbor approximately 70 percent of the planet’s animal and plant species4. Indeed, Brazil has the richest mammalian diversity globally, with 712 identified mammals, including between 102 and 139 primate species5,[6](https://www.nature.com/articles/s41597-025-06034-0#ref-CR6 “Gomes, L. B. et al. Conservation challenges for Brazilian primates and the role of protected areas in a changing climate. Scientific Reports 14, 31356, https://doi.org/10.1038/s41598-024-82717-9
(2024).“), many of which are endemic. Indeed, approximately 18 percent of all primate species exist solely in Brazil7. Over one-third of these primate species are threatened with extinction, and at least three are among the twenty-five most threatened of all species globally[6](https://www.nature.com/articles/s41597-025-06034-0#ref-CR6 “Gomes, L. B. et al. Conservation challenges for Brazilian primates and the role of protected areas in a changing climate. Scientific Reports 14, 31356, https://doi.org/10.1038/s41598-024-82717-9
(2024).“). Indeed, half of the primate species are threatened with extinction in Brazil’s heavily impacted Cerrado, Caatinga, and Atlantic Forest Biomes[6](https://www.nature.com/articles/s41597-025-06034-0#ref-CR6 “Gomes, L. B. et al. Conservation challenges for Brazilian primates and the role of protected areas in a changing climate. Scientific Reports 14, 31356, https://doi.org/10.1038/s41598-024-82717-9
(2024).“).
The importance of mapping HFI in extremely high resolution is apparent when examining the spatial interaction of many species with a limited scale of spatial interaction, including high-cognitive mammals, such as nonhuman primates. Primate species have a variety of home range sizes and may change their daily travel distance and home range according to the size of the area they live in. For example, a culturally significant population of bearded capuchin (Sapajus libidinosus) located in the mangroves of Maranhão State8 has an observed home range of only 37 ha8, less than 40 percent of a typical single HFI pixel. Therefore, HFI at a resolution 1,000 times higher than current databases is required to assess the human impact on these primates and a host of other fauna whose scale of operation cannot be measured in multiples of square kilometers.
This mismatch between HFI mapping scales and the scale of primate species’ interaction with their environment is common. For instance, Howler Monkeys in Brazil (Alouatta palliata) have been observed to vary their home range between 5.9 ha and 89.5 ha9. The muriqui-do-norte (Brachyteles hypoxanthus), the largest Brazilian primate species, travels only 1,075 to 1,132 meters daily, or one-tenth of a typical HFI mapping unit, with home ranges varying from only 128 ha to 445 ha10. A small-bodied primate, the common marmoset (Callithrix jacchus) travels an average daily distance of 1048 m ± 446 m (min.: 70 m and max.: 1805 m) with an estimated home range of only 7.31 ha11 or 7 percent of a typical HFI pixel. Again, these areas are exceedingly small fractions of a typical HFI single pixel and far from the many hundreds, or even thousands, of input pixels required to represent discrete changes within the operational range of many species.
Using high-resolution human footprint data increases the possibility of detecting landscape barriers that isolate wildlife populations. Major roads and railways contribute to habitat fragmentation12, increasing primate mortality13,14, facilitating illegal hunting15, and increasing human activities along such corridors16. The increase in linear infrastructures decreases a primate population’s survival chances17. Additionally, primates rely on plant species as food sources18,19,20, the structure of the canopy for movement21, and intact large trees act as sleeping sites for many species22,23. Therefore, it is critical to detect interruptions in these crucial drivers of primate survival, which low-resolution imagery or coarse data resolutions at current scales do not facilitate.
This paper creates an aggregated anthropogenic impact spatial database with a 10 m resolution for Brazil across all biomes (Fig. 1). The spatial database evaluates how human influence on the environment impacts nonhuman primate species and other medium to large mammals and acts as a universal high-resolution HFI map for Brazil. The dataset provided is 10,000 times higher resolution than the current human footprint index datasets available, analyzing over 86.5 billion individual 10 m resolution pixels. To ensure data integrity, input datasets are sampled and validated by sub-10 cm orthorectified high-resolution UAV imagery or 30 cm orthorectified satellite imagery. In addition to the database, we provide the code and methodology to create such datasets for almost any country.
Fig. 1
Brazilian Biomes.
Methods
The HFI schema closely follows the global record of the annual terrestrial Human Footprint dataset from 2000 to 2018[24](https://www.nature.com/articles/s41597-025-06034-0#ref-CR24 “Mu, H. et al. A global record of annual terrestrial Human Footprint dataset from 2000 to 2018. Scientific Data 9, 176, https://doi.org/10.1038/s41597-022-01284-8
(2022).“). Still, it builds on earlier databases by creating a dataset 10,000 times more resolute, using an improved method to capture human activity in the landscape. The HFI produced is most suitably applied to studying medium to large mammals such as nonhuman primates, while accounting for their scale of spatial operations. We utilize much of the same input human impact criteria as earlier HFI impact studies, which are primarily based on land cover classes such as cropland, pasture, and urban land cover; distance from human infrastructure such as major roads, railways, and waterways; and a means of accounting for the human population such as nighttime light emissivity[24](https://www.nature.com/articles/s41597-025-06034-0#ref-CR24 “Mu, H. et al. A global record of annual terrestrial Human Footprint dataset from 2000 to 2018. Scientific Data 9, 176, https://doi.org/10.1038/s41597-022-01284-8
(2022).“).
The 10 m scale of the dataset provided problems in establishing a census-like measure of the human population. Census data was rejected as it is compiled inconsistently within and across nations; the enumeration units are typically at least one square kilometer, and no census data exists at a 10 m scale. For example, simply dividing the area of the USA by the number of census blocks results in an average census block being slightly larger than 1 km2. Importing any dataset, such as census data or nighttime lights, that does not comply with being created at a 10 m or less reduced scale, or less would require some form of interpolation, dasymetric mapping, or pycnophylactic weighting, introducing low-resolution extrapolated or interpolated data into the HFI measure and diluting the useability of a 10 m HFI database. Additionally, the most appropriate layer to conduct the dasymetric mapping or pycnophylactic weighting would be the 10 m land cover dataset already utilized.
In many HFI analyses, remotely sensed nighttime lights overcome the census problem identified in the above paragraph. Unfortunately, nighttime lights exist only at coarse scales of 1 km2 or 4 km2, as opposed to 100 m2, and would require substantial sub-pixel reapportioning and defeat the purpose of a 10 m HFI database. Nighttime lights also have the downside of having a limited dynamic range, which causes radiance saturation in brightly lit urban areas, thereby failing to differentiate between varying levels of high-intensity light emissions25. The saturation effect flattened the brightness values, making it difficult to distinguish between moderately and highly lit zones. Additionally, nighttime does poorly represent human activity in areas without high emissivity rates, such as areas with limited electricity services or highly developed daytime-only use areas like parks, beaches, or industrial estates.
We overcome the limitations of nighttime lights and census-like data in HFI at 10 m resolution by utilizing a new global database that better accounts for human activity on the landscape. A suite of technology companies has used deep learning algorithms, AI, and dynamic segmentation from high-resolution 30 cm Maxar imagery to extract global building footprints[26](https://www.nature.com/articles/s41597-025-06034-0#ref-CR26 “Microsoft. Global MLB Building Footprints, https://github.com/microsoft/GlobalMLBuildingFootprints
(2018).“). Currently, these footprints are available for most of the Americas, Europe, Africa, Australia, Oceania, and parts of Asia, except China, and a global expansion is underway. There are approximately 136 million building footprints across Brazil, and they are used as a proxy for human population, interaction, and landscape impact.
Data sources
Land cover
Land cover classes were derived from Sentinel-2 satellite imagery. The Sentinel-2 data is available from MapBiomas Brazil27. The data is part of the 10 m Beta collection from 2016 to 2023. We used the 2023 version of the data. The data are classified into 21 granular land cover classes that, in the higher order, include forests, other natural vegetation, mosaic uses, agriculture, non-vegetated areas, urban areas, pasture, and water. The complete land cover classification system with all suborders is available in the data repository. Preprocessing involved reprojecting the data from the unprojected coordinate system EPSG:4326 to the projected coordinate system EPSG:5880. The code used in this process is available in the repository[28](https://www.nature.com/articles/s41597-025-06034-0#ref-CR28 “Hamilton, S. & Presotto, A. (A 10-meter resolution human footprint dataset to support biodiversity and conservation research in Brazil, https://doi.org/10.5281/zenodo.15306587
2025).“). The metadata can be viewed, and the data can be obtained from the MapBiomas platform[29](https://www.nature.com/articles/s41597-025-06034-0#ref-CR29 “MapBiomas. MapBiomas – Collection 2 (beta) of the Land Cover and Land Use Annual Maps of Brazil using Sentinel-2 images, https://brasil.mapbiomas.org/en/mapbiomas-cobertura-10m/
(2024).“).
Building density
Building footprints were obtained from Microsoft Corporation[26](https://www.nature.com/articles/s41597-025-06034-0#ref-CR26 “Microsoft. Global MLB Building Footprints, https://github.com/microsoft/GlobalMLBuildingFootprints
(2018).“). The data were delivered in vector polygon and point formats. We utilized the point dataset for computational efficiency as the processing involved a density measure. The dataset was constructed post-2020 from 30 cm Maxar imagery. Preprocessing involved reprojecting the data from the unprojected coordinate system EPSG:4326 to the projected coordinate system EPSG:5880. The metadata can be viewed, and the data can be obtained from GitHub[26](https://www.nature.com/articles/s41597-025-06034-0#ref-CR26 “Microsoft. Global MLB Building Footprints, https://github.com/microsoft/GlobalMLBuildingFootprints
(2018).“). The dataset contained 136,238,215 building footprints for Brazil.
Major roads, railroads, and navigable waterways
We used the major official highways dataset obtained from the Brazilian government via the Sistema Nacional de Aviação30. The dataset is in vector line format for the Brazilian national territory. The metadata can be viewed, and the data can be obtained from a Brazilian government platform30. Brazil has approximately 129,294 km of designated major highways.
The Agencia Nacional de Aguas (ANA) provided the railroads via the National Infrastructure and Transportation Department. The dataset has been compiled since 1999 and is maintained and updated31. The dataset was provided in a vector line format. The metadata can be viewed, and the data can be obtained from the ANA metadata portal31. Brazil has limited railroads, with only approximately 33,463 km of railroad.
ANA provided the navigable waterways via the Ministério dos Transportes - Banco de Informações e Mapas de Transportes32. This data was published in 2010, and we assume that the navigable rivers have not changed substantially since 2010 and that the data is regularly updated, as with all other data from ANA. The dataset was provided in a vector line format. The data were projected in EPSG:4674 / SIRGAS 2000. The metadata can be viewed, and the data can be obtained from the ANA metadata32. Brazil has approximately 110,000 km of navigable waterways.
Preprocessing the vector line datasets above involved reprojecting them to EPSG:5880 and correcting any topological inconsistencies. This code is available in the repository[28](https://www.nature.com/articles/s41597-025-06034-0#ref-CR28 “Hamilton, S. & Presotto, A. (A 10-meter resolution human footprint dataset to support biodiversity and conservation research in Brazil, https://doi.org/10.5281/zenodo.15306587
2025).“).
Data analysis
Most of the data was received in the unprojected reference system EPSG:4326. This geographic reference system is unsuitable for this analysis in Brazil for two primary reasons. Firstly, EPSG:4326 uses decimal degrees as its linear unit. Therefore, the ground area of each pixel decreases as you change latitude while moving away from the equator. This feature is particularly problematic for Brazil, which is one of the most latitudinally extensive countries in the world, stretching from 5°16′20″ N to 33°45′03″ S, or over 4,300 km. Secondly, much of the analysis relies on the distance from features, and EPSG:4326 does not attempt to preserve distance relationships between locations.
For the above reasons, the Brazilian projection EPSG:5880 was utilized for this analysis for all data layers. This projection, often known as SIRGAS 2000 / Brazil Polyconic, was designed to account for Brazil’s unique geospatial extent. This reference system has numerous benefits, including having a linear unit of meters and an established, accurate geodetic conversion to EPSG:4326. Although not equal area or equidistant, this reference system is a compromise that preserves both well across the study area. The world file is available in the data repository, as it is the translation file to and from EPSG:4326 and the code that completed the translation.
A base-level 10 m resolution raster grid aligned with the Sentinel-2 land cover grid obtained from MapBiomas was created to ensure consistent pixel alignment and geospatial extents. This grid was used as the alignment grid, sometimes called a snap raster in commercial software, and as the default extent, ensuring all analyzed layers have perfect pixel alignment and the same number of pixels and identical spatial extents. The grid contains 86,564,101,279 pixels, each 10 m by 10 m, or 100 m2. The alignment/extent raster grid is made available in the data repository.
Land cover
The land cover data was reclassified directly into human impact scores based on the land cover class. The value of 0 was given to each 10 m pixel considered least impacted by human activity, the value of 4 was given to each pixel that had a moderate level of human impact, the value of 7 was given to those pixels with high levels of human impact, and a value of 10 to those with the highest level of human impact (Table 1). These values were applied to be consistent with other global models[24](https://www.nature.com/articles/s41597-025-06034-0#ref-CR24 “Mu, H. et al. A global record of annual terrestrial Human Footprint dataset from 2000 to 2018. Scientific Data 9, 176, https://doi.org/10.1038/s41597-022-01284-8
(2022).“). The code is available for download in the repository[28](https://www.nature.com/articles/s41597-025-06034-0#ref-CR28 “Hamilton, S. & Presotto, A. (A 10-meter resolution human footprint dataset to support biodiversity and conservation research in Brazil, https://doi.org/10.5281/zenodo.15306587
2025).“).
Building density
We calculated the building density for every 10 m cell at the 1 km2 level (Table 2). We buffered each 10 m cell in the dataset by 1 km, and the cell’s value was populated with the building count over this 1 km2 area; this was repeated for all 86.5 billion pixels. The minimum building density was 0, and the maximum was 9,520. The data was then binned into four geometric intervals, with 10 representing the highest levels of building density, 7 representing a high level of building density, 4 representing a lower level of building density, and 0 representing few, if any, buildings. The building density code is available to download from the repository[28](https://www.nature.com/articles/s41597-025-06034-0#ref-CR28 “Hamilton, S. & Presotto, A. (A 10-meter resolution human footprint dataset to support biodiversity and conservation research in Brazil, https://doi.org/10.5281/zenodo.15306587
2025).“).
Major roads, railroads, and navigable waterways
Each input vector line layer was rasterized into the 10 m predefined raster grid. We then populated each of the 86.5 billion pixels with the distance in meters from the nearest grid cell in which the feature was present. Each layer’s algorithmic approach was utilized to calculate the HFI. The formulae for each HFI conversion are represented in Table 3. The processing code for major roads, railroads, and navigable waterways is available for download in the associated repository[28](https://www.nature.com/articles/s41597-025-06034-0#ref-CR28 “Hamilton, S. & Presotto, A. (A 10-meter resolution human footprint dataset to support biodiversity and conservation research in Brazil, https://doi.org/10.5281/zenodo.15306587
2025).“).
Data aggregation
To aggregate the data into a final HFI score, we qualitatively considered the role of each input measure on the human impact on primates and other medium to large mammals from literature, as well as the importance of near-pristine or non-intervened environments. This qualitative final aggregation is common in HIE evaluations, indeed in all multi-criteria analyses in which the weight cannot be obtained from the data itself. The weighting of each quantitative criterion is fundamentally left to the data user and their use case. The data and code are organized to facilitate a rapid recalculation of a HIF score with desktop computational capabilities, allowing for weightings tailored to the specific species or phenomena of interest to the future data user. This adaptability of the HIF score based on the species of observation or other criteria is an advance and is not available in any earlier HIE analyses.
Equal weighting of each variable, used in many earlier analyses, was rejected because it is itself a qualitative decision with undesirable outcomes. For example, in an equal weighting scenario, comparing navigable inland waterways in the forested interior to a major highway in downtown São Paulo, they should not have the same impact, but this would occur. In another example, the proximity to a mining railroad in a rural area would not have the same impact as a region with extremely high population densities and the associated infrastructure.
Human activities in natural environments threaten 65% of all primate species globally, accounting for 74% of the global population present in primate range regions33. Land cover and population density are likely the most significant factors in primate conservation. They impact the availability34, quality35, and connectivity of habitats36. In areas of highly developed land cover, such as urban areas, the impact on animal populations is higher than in areas with less intense land cover, such as continuous forests. Thus, we weighed land cover and building density, our proxy for population density, at 40 percent each.
Major roads contribute to primate roadkill and obstruct primate movement. They traverse zones with dense vegetation37 where animals are present. Roads are already captured in the urban classification. By adding major roads as a separate input, you capture highways traversing through non-urban regions such as forests and other habitats and magnify the impact within urban areas with major roads. Therefore, major roads are weighted at 10 percent, with the understanding that they are already integrated into the 40 percent land cover assignment. Railroads, on the other hand, affect forested areas by removing trees and creating gaps in the canopy, and they are reported to primarily impact primates, with fewer effects on semiterrestrial primates in different parts of the world38. The presence of railroads and navigable waterways in Brazil is limited compared to other infrastructure; we weighted them both at 5%.
The final criterion allows the data user to adjust weighting based on their species or region of interest.
The final weighted output dataset is available in the repository as HFI_Weighted_vx.tif, where higher x values indicate later data releases. The final summed HFI scores range from 0 to 10, with 9.4 indicating the highest human impact and 0 indicating the lowest.
Hardware and Software
All geospatial analysis was conducted in Python 339 in the ArcGIS Pro 3.340 environment using 48 Intel Xeon w7-3455 Processors (67.5 M Cache, 2.50 GHz) and 96 GB of RAM in a Windows 64-bit parallel processing environment. The final raster datasets were compiled and compressed in GDAL41.
The fixed-wing eBEE X series lightweight mapping UAV42 with RTK positioning was used for data validation flights. Missions were planned out in Emotion V3.1343. The first UAV payload was the MicaSense Red Edge MX 5-band multispectral sensor44, closely mimicking the bands available in Sentinel-2 and allowing for land cover classification verification and vector dataset verification. The second UAV Payload was the visible spectrum SODA camera45, allowing only vector datasets to be verified. Pix4D46 was the processing engine that processed the UAV data and created orthorectified imagery. We utilized parallel processing across 6144 CUDA cores in the geospatial analysis in the Windows 64-bit system defined above.
Worldview 3 Imagery at a resolution of 30 cm from 2017 was used for additional validation in and around Carlos Botelho State Park, as the UAV captured primarily less-impacted habitats in rural and forested environments, and is a known primate habitat. A commercial vendor provided the Worldview imagery, which came preprocessed and orthorectified with eight 30 cm multispectral bands and covered 2,500 ha centered on latitude −24.070475° and longitude −47.971581°.
Data Records
HIBR-10: Human Impacts on the Brazilian Environment at the 10 m scale is publicly available in a geospatial repository at https://doi.org/10.5281/zenodo.15306587[28](https://www.nature.com/articles/s41597-025-06034-0#ref-CR28 “Hamilton, S. & Presotto, A. (A 10-meter resolution human footprint dataset to support biodiversity and conservation research in Brazil, https://doi.org/10.5281/zenodo.15306587
2025).“). The vector data are available in open shapefile format, and the raster data are available in open GeoTiff format. The repository includes integrated geospatial metadata for each dataset. The Worldview 3 imagery data used for validation around São Paulo is a commercial product that the authors cannot release. The validation UAV imagery is available from the GIS-ready Raster Datasets of Rio Preguicas Mangrove Stand, Maranhão State, Brazil repository at https://doi.org/10.13016/M2NV99D7247.
The final weighted output dataset is available in the repository as HFI_Weighted_vx.tif, where higher x values indicate later data releases. The values are adjusted from zero to ten to zero to one hundred using a multiplication value of 10. Therefore, five becomes fifty, and so on, to allow for an integer distribution of the data, which results in the file sizes decreasing from almost 1 TB to under 6 GB.
The input railroad, waterways, and major road inputs are provided in the Input_Vector.zip file. The input Biomas Sentinel-2 10 m land cover data is available through the Linktosentinel.txt file with the legend provided in the Sentinel_Legend.pdf. The building footprints are available through the Linktobuildingfootprints.txt file, and all code required to reprocess the data is provided in the python_code.zip file.
Technical Validation
We use a dual approach to the technical validation of the input datasets. First, we report the errors and uncertainty of each dataset and instrument provided by the original data generators or independent authors. Second, we use UAVs and Worldview 3 imagery to validate all the input datasets. The three validation locations from south to north are Carlos Botelho State Park, São Paulo (Fig. 2), a known primate hotspot containing a well-preserved Atlantic Forest. This southern site uses Worldview-3 30 cm multispectral imagery for validation. Near the center of Brazil, the following validation site is around the Gilbués community (Fig. 2) in the State of Piauí, which contains a mosaic land cover of agriculture, patchwork forests, and rural communities. It is located in the Cerrado Biome. This site is validated using UAVS. Again, it is a well-documented primate site. The final validation site is around the Atins community in the north, within the State of Maranhão (Fig. 2), which contains a mangrove, estuarine, and dune-dominated landscape with isolated touristic communities. It is located in the mangrove and coastal dune biome. This site is validated using UAVS. Again, it is a known primate location. The validation sites were chosen for their diverse environments, urban/rural settings, land use/land cover types, significance as primate habitats in academic studies, and accessibility for UAV operations cover characteristics, known importance as primate habitat in the scholarly literature, and as sites of opportunity where we could gain the required permission to fly the UAVs.
Fig. 2
Validation Sites.
Land Cover Reported Accuracy
The accuracy of MapBiomas Collection 9, containing the annual land cover and land use maps from 1985 to 2022, is reported as follows: Level 1–93.1 percent; Level 2–89.8 percent; and Level 3–89.8 percent. The data used is Level 1. The MapBiomas assessment used approximately 75,000 independent validation samples and was conducted yearly48. The Sentinel-2 land cover classification that feeds into MapBiomas was measured in the Brazilian Pantanal biome employing the same Random Forest algorithm as the input land cover dataset over an area of 44,998 km2 and reached an overall accuracy of 95.9 percent49.
Buildings Reported Accuracy
The building false positive rate reported across 5,000 samples in South America was 1.7 percent, with building precision reported at 94 percent (Microsoft 2018).
Major Roads, Railroads, and Navigable Waterways Reported Accuracy
No author accuracy assessment was provided for these layers.
Cross-Validation
Similar remote sensing studies have noted that adequate accuracy assessments of remotely sensed data often require using a higher-order truth — that is, reference data that are more accurate or collected at a finer spatial resolution than the data being evaluated50,51,[52](https://www.nature.com/articles/s41597-025-06034-0#ref-CR52 “Hamilton, S. E., Lovette, J. P., Borbor-Cordova, M. J. & Millones, M. The Carbon Holdings of Northern Ecuador’s Mangrove Forests. Annals of the American Association of Geographers 107, 54–71, https://doi.org/10.1080/24694452.2016.1226160
(2017).“). Such higher-order cross-validation ensures that the validation technique assesses accuracy rather than just agreeing with another imperfect dataset, often derived from the same or similar instruments and processes. Such data as field data, high-resolution imagery, or expert-derived survey maps form a gold standard against which automated classifications or coarser-resolution data are assessed50,51. Assessments in this manner also account for summated errors from the myriad of errors and uncertainty introduced during the remote sensing data creation process. For this project, we have higher-order data for three locations in Brazil (Fig. 2) and use these data to report data agreement on the standard error (SE) and create confidence intervals53 for each input dataset aside from railroads. Additionally, we report the error and uncertainty noted by any third-party data originator or other data validators. Differences may be actual errors, temporal misalignment, or user classification errors.
Land Cover Cross Validation
Data validation for the land cover dataset is less straightforward than the input vector databases, as although the MapBiomas Sentinel-2 derived imagery and the validation instruments contain similar temporal, radiometric, and spatial properties, differences do occur; additionally, we cannot access the algorithms used to create the 10 m land cover MapBiomas database, so a like-for-like comparison is difficult. Our approach uses the higher-level MapBiomas classes of forest, herbaceous, shrubby vegetation, farming, non-vegetated lands, and water, as these higher-order categories scored similarly in the HFI reclassification (Table 1). We then create 30 random points across each of the three validation sites and automatically extract the higher-order land cover value from MapBiomas. We then use the same classes to visually assess the land cover from the higher-resolution imagery and measure the levels of agreement. We use percent agreement with standard error and confidence intervals53.
The percentage of land cover agreement was 95.6 percent (n = 90), indicating high consistency between the land cover classification of both MapBiomas and the high-resolution multispectral imagery (Table 4). The SE was ±2.17 percent, and the 95 percent confidence interval ranged from 89.1 percent to 98.3 percent.
Major roads, waterways, and railroads cross validation
Data validation for major roads and navigable waters is straightforward. In the areas where these vector features overlap our UAV or high-resolution satellite imagery, we randomly created 30 points along each input vector feature within the three validation regions where such features were present. We then noted if each random point was within 10 m of the footprint on the feature in the high-resolution imagery. For example, we asked if each random point was within 10 m of the outer curbs for roads or within 10 m of the water for rivers. None of the validation sites had railroads present, as there are few active railroads in Brazil.
The percentage of agreement on the major road location was 100 percent (n = 30), indicating high consistency between the major roads database and the high-resolution multispectral imagery (Table 5). The SE was 0 percent, and the 95 percent confidence interval ranged from 88.6 percent to 100 percent. Only one site had major roads, resulting in a smaller sample size and broader confidence intervals.
The percentage of agreement on the navigable river location was 60 percent (n = 30), and only moderate agreement was depicted between the navigable rivers database and the high-resolution multispectral imagery (Table 6). The SE was ±8.94 percent, and the 95 percent confidence interval ranged from 42.3 percent to 75.4 percent. Only one validation site had navigable rivers, resulting in a smaller sample size and broader confidence intervals. Although in low agreement, much of the validated region was coastal wetlands, and the center line of the navigable waterway did pass very close to the river within the adjacent flooded wetlands, bounding dunes, and across islands inside the navigable rivers. For these reasons, it is likely a representative input layer for human influence along navigable waterways; additionally, it only weighs 5 percent in the analysis and is the least impactful of our HFI layers.
Building cross validation
For buildings, we selected 30 random building polygons from the input building dataset for validation for each region. Again, we noted whether a building was within 10 m of that location in the high-resolution cross-validation imagery. The percentage of building agreement was 86.7 (n = 90) percent, indicating strong consistency between the building delineations of both the input commercial datasets and the high-resolution multispectral imagery (Table 7). The SE was ±3.58 percent, and the 95 percent confidence interval ranged from 78.1 percent to 92.2 percent.
Temporal variation
To assess temporal variation in the composite dataset related to urban expansion, major road expansion, and building footprint expansion, we analyzed the area converted from any land cover to urban in the 10 m Sentinel-2 derived data between 2022 and 2023. We find that 12,257,989 10 m pixels, or approximately .01% of the dataset, underwent this transition annually. This aligns with the 2024 internal accuracy assessment of the non-BETA version which reports overall stability over the 18-year mapping period, varying across biomes from 84.6% to 97.7%54.
Data visualization
The first visualization (Fig. 3) represents the HFI score across six differing biomes (Fig. 1) in Brazil, each with distinct levels of HFI but the same spatial scales. The top left panel and the bottom right panel depict relatively pristine Amazon Rainforest and Pantanal biomes with the lowest levels of HFI. Still, the data clearly illustrates the increasing agricultural expansion and forest clearing occurring along highways deep into the interior. The top-middle and top-right panels depict relatively moderately impacted Cerrado and Caatinga biomes with the mid-levels of HFI. The data clearly illustrates the high-density infrastructure across a large swath of these biomes. The lower left and the lower central panel represent the dense urban environments of coastal Brazil within the Mata Atlantica and Pampa biomes, with limited low HFI regions remaining. This visualization additionally represents the data as a mean score across the major biomes of Brazil, demonstrating the applicability of these data to aggregate in polygonal databases based on criteria such as municipality, protected status, or species habitat at almost any scale.
Fig. 3
Human Impact Score across Differing Biomes.
Figure 4 represents the final 10 m product at a much-reduced resolution to visualize the entirety of Brazil. We aggregate millions of HFI pixels into each mapping pixel to produce this heavily reduced resolution visualization of the final data.
Fig. 4
Brazilian HFI.
Usage Notes
All horizontal measurement units are meters or kilometers, and all area units are square meters or square kilometers. The No Data value is 255 and should be automatically read by most GIS software. The geospatial data is available in open formats and ready for immediate utilization in FOSS-GIS applications such as QGIS, GRASS, and GDAL/OGR, as well as commercial s