Last week, Jake Stid, a Postdoctoral Research Associate at Michigan State University, announced Ground-Mounted Solar Energy in the United States (GM-SEUS). This is a 15K-array, 2.9M-panel dataset of utility and commercial-grade solar farms across 49 states in the US (lower 48 plus the District of Columbia). This dataset was constructed by a team of researchers including alumni from NOAA, NASA and the USGS.
Below is a heatmap of the assets catalogued in this dataset.
GM-SEUS is broken up into two datasets, one for arrays and another panels. Below you can see a solar farm with the array outlined in red and the panels covered purple.
In this post, Iβ¦
Last week, Jake Stid, a Postdoctoral Research Associate at Michigan State University, announced Ground-Mounted Solar Energy in the United States (GM-SEUS). This is a 15K-array, 2.9M-panel dataset of utility and commercial-grade solar farms across 49 states in the US (lower 48 plus the District of Columbia). This dataset was constructed by a team of researchers including alumni from NOAA, NASA and the USGS.
Below is a heatmap of the assets catalogued in this dataset.
GM-SEUS is broken up into two datasets, one for arrays and another panels. Below you can see a solar farm with the array outlined in red and the panels covered purple.
In this post, Iβll explore GM-SEUSβs Solar Farm dataset.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my systemβs C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
Iβm running Ubuntu 24 LTS via Microsoftβs Ubuntu for Windows on Windows 11 Pro. In case youβre wondering why I donβt run a Linux-based desktop as my primary work environment, Iβm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
Iβll use GDAL 3.9.3 and a few other tools to help analyse the data in this post.
$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
gdal-bin \
jq
Iβll use DuckDB v1.4.1, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
Iβll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
The maps in this post were mostly rendered with QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGISβ Tile+ plugin to add basemaps from Esri to the maps in this post.
Analysis-Ready Data
Iβll download a dataset containing the US CENSUS State codes. This will let me map the state ID in the arrays dataset to their state name.
$ wget https://gist.github.com/a8dx/2340f9527af64f8ef8439366de981168/raw/81d876daea10eab5c2675811c39bcd18a79a9212/US_State_Bounding_Boxes.csv
Iβll download the ZIP file of deliverables for GM-SEUS.
$ wget -O GMSEUS_v1_0.zip \
'https://zenodo.org/records/14827819/files/GMSEUS_v1_0.zip?download=1'
$ unzip GMSEUS_v1_0.zip
Iβll extract the projection used. This proj4 string will be used to below to re-project the data into EPSG:4326.
$ gdalsrsinfo \
-o proj4 \
GMSEUS_v1_0/GPKG/GMSEUS_Arrays_Final.gpkg
+proj=aea +lat_0=37.5 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
Iβll use DuckDB to clean up the values and produce both a geometry field and a bounding box for each feature in this dataset. This will make working with this dataset remotely, such as from AWS S3, much easier.
$ ~/duckdb
This following produced a ZStandard-compressed, spatially-sorted Parquet file of the arrays dataset. I dropped the Z dimension as it was unused. The unknown values have been turned into NULLs. The original GPKG file was 108 MB and the resulting Parquet file is 37 MB.
COPY (
WITH a AS (
SELECT * EXCLUDE(geom),
ST_FORCE2D(
ST_FLIPCOORDINATES(
ST_TRANSFORM(
geom,
'+proj=aea +lat_0=37.5 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs',
'EPSG:4326'))) geometry
FROM ST_READ('GMSEUS_v1_0/GPKG/GMSEUS_Arrays_Final.gpkg')
)
SELECT a.* EXCLUDE (geometry,
tilt,
tiltEst,
instYr,
instYrLT,
effInit,
avgAzimuth,
avgLength,
avgSpace,
avgWidth,),
{'xmin': ST_XMIN(ST_EXTENT(geometry)),
'ymin': ST_YMIN(ST_EXTENT(geometry)),
'xmax': ST_XMAX(ST_EXTENT(geometry)),
'ymax': ST_YMAX(ST_EXTENT(geometry))} AS bbox,
ST_ASWKB(geometry) geometry,
CASE WHEN instYr::INT = -9999 THEN NULL ELSE instYr::INT END AS instYr,
CASE WHEN instYrLT::INT = -9999 THEN NULL ELSE instYrLT::INT END AS instYrLT,
CASE WHEN numRow::INT = -9999 THEN NULL ELSE numRow::INT END AS numRow,
CASE WHEN tilt::INT = -9999 THEN NULL ELSE tilt::INT END AS tilt,
CASE WHEN tiltEst::INT = -9999 THEN NULL ELSE tiltEst::INT END AS tiltEst,
CASE WHEN effInit::INT = -9999 THEN NULL ELSE effInit END AS effInit,
CASE WHEN avgAzimuth::INT = -9999 THEN NULL ELSE avgAzimuth END AS avgAzimuth,
CASE WHEN avgLength::INT = -9999 THEN NULL ELSE avgLength END AS avgLength,
CASE WHEN avgSpace::INT = -9999 THEN NULL ELSE avgSpace END AS avgSpace,
CASE WHEN avgWidth::INT = -9999 THEN NULL ELSE avgWidth END AS avgWidth,
b.NAME state_name
FROM a
JOIN 'US_State_Bounding_Boxes.csv' b ON a.STATEFP = b.STATEFP
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO 'arrays.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
The original GPKG file for the panels dataset was 1.1 GB and the resulting Parquet file is 334 MB.
COPY (
WITH a AS (
SELECT * EXCLUDE(geom),
ST_FORCE2D(
ST_FLIPCOORDINATES(
ST_TRANSFORM(
geom,
'+proj=aea +lat_0=37.5 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs',
'EPSG:4326'))) geometry
FROM ST_READ('GMSEUS_v1_0/GPKG/GMSEUS_Panels_Final.gpkg')
)
SELECT * EXCLUDE (geometry,
rowSpace),
{'xmin': ST_XMIN(ST_EXTENT(geometry)),
'ymin': ST_YMIN(ST_EXTENT(geometry)),
'xmax': ST_XMAX(ST_EXTENT(geometry)),
'ymax': ST_YMAX(ST_EXTENT(geometry))} AS bbox,
ST_ASWKB(geometry) geometry,
CASE WHEN rowSpace::INT = -9999 THEN NULL ELSE rowSpace END AS rowSpace
FROM a
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO 'panels.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Solar Arrays
The arrays Parquet file has 15,017 rows. Below is an example record.
$ echo "SELECT * EXCLUDE(bbox,
geometry),
bbox::JSON bbox
FROM 'arrays.parquet'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"COUNTYFP": "019",
"GCR1": 0.6996,
"GCR2": 0.614,
"STATEFP": "45",
"Source": "OSM",
"arrayID": 2807,
"avgAzimuth": 170.63,
"avgLength": 47.76166666666666,
"avgSpace": 3.003333333333333,
"avgWidth": 4.776666666666666,
"bbox": {
"xmax": -79.97229830431786,
"xmin": -79.97325770533094,
"ymax": 32.87833627192598,
"ymin": 32.87808294640646
},
"capMW": 0.246,
"capMWest": 0.246,
"effInit": 0.197963503102977,
"instYr": 2021,
"instYrLT": 2021,
"latitude": 32.87818725544087,
"longitude": -79.97276617375104,
"modType": "c-si",
"mount": "fixed_axis",
"nativeID": "9324",
"newBound": 1,
"numRow": 6.0,
"numRow_1": 6,
"state_name": "South Carolina",
"tilt": 30,
"tiltEst": 30,
"totArea": 1779.0,
"totRowArea": 1244.93,
"version": "v1.0"
}
]
Below are the field names, data types, percentages of NULLs per column, number of unique values and minimum and maximum values for each column.
$ ~/duckdb
SELECT column_name,
column_type,
null_percentage,
approx_unique,
min,
max
FROM (SUMMARIZE
FROM READ_PARQUET('arrays.parquet'))
WHERE column_name != 'geometry'
AND column_name != 'bbox'
ORDER BY 1;
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββββββ¬βββββββββββββββββββββ
β column_name β column_type β null_percentage β approx_unique β min β max β
β varchar β varchar β decimal(9,2) β int64 β varchar β varchar β
βββββββββββββββΌββββββββββββββΌββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββββΌβββββββββββββββββββββ€
β COUNTYFP β VARCHAR β 0.00 β 235 β 001 β 810 β
β GCR1 β DOUBLE β 0.00 β 5057 β 0.1047 β 1.0 β
β GCR2 β DOUBLE β 0.00 β 5013 β 0.1245 β 0.988 β
β STATEFP β VARCHAR β 0.00 β 49 β 01 β 56 β
β Source β VARCHAR β 0.00 β 6 β CCVPV β USPVDB β
β arrayID β BIGINT β 0.00 β 13155 β 1 β 15017 β
β avgAzimuth β DOUBLE β 32.84 β 4295 β 25.0 β 269.27 β
β avgLength β DOUBLE β 39.79 β 8358 β 4.02 β 449.5004 β
β avgSpace β DOUBLE β 39.79 β 8623 β 0.024 β 20.0 β
β avgWidth β DOUBLE β 39.79 β 9185 β 0.67625 β 29.80222222222222 β
β capMW β DOUBLE β 0.00 β 5280 β 0.001250225184651 β 1051.703 β
β capMWest β DOUBLE β 0.00 β 7863 β 0.004 β 3170.1 β
β effInit β DOUBLE β 0.49 β 39 β 0.132210289727273 β 0.205484167047619 β
β instYr β INTEGER β 0.00 β 24 β 1985 β 2024 β
β instYrLT β INTEGER β 0.24 β 17 β 2009 β 2023 β
β latitude β DOUBLE β 0.00 β 16986 β 25.53796582594631 β 48.99547137225406 β
β longitude β DOUBLE β 0.00 β 15656 β -124.10440474967092 β -67.15066374183608 β
β modType β VARCHAR β 0.00 β 3 β c-si β thin-film β
β mount β VARCHAR β 0.00 β 10 β dual_axis β unknown β
β nativeID β VARCHAR β 0.00 β 15141 β 1 β York Solar β
β newBound β BIGINT β 0.00 β 2 β 0 β 1 β
β numRow β DOUBLE β 0.00 β 1461 β 0.0 β 56782.0 β
β numRow_1 β INTEGER β 0.00 β 1117 β 0 β 56782 β
β state_name β VARCHAR β 0.00 β 57 β Alabama β Wyoming β
β tilt β INTEGER β 55.46 β 47 β 0 β 83 β
β tiltEst β INTEGER β 55.46 β 30 β 10 β 43 β
β totArea β DOUBLE β 0.00 β 13182 β 54.0 β 13735113.0 β
β totRowArea β DOUBLE β 0.00 β 15396 β 44.97 β 7223924.662 β
β version β VARCHAR β 0.00 β 1 β v1.0 β v1.0 β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββββββ΄βββββββββββββββββββββ€
β 29 rows 6 columns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Iβll generate a heatmap of the asset locations in this dataset.
CREATE OR REPLACE TABLE h3_4_stats AS
SELECT H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin, 4) AS h3_4,
COUNT(*) num_buildings
FROM READ_PARQUET('arrays.parquet')
WHERE bbox.xmin BETWEEN -178.5 AND 178.5
GROUP BY 1;
COPY (
SELECT ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_4)::geometry) geometry,
num_buildings
FROM h3_4_stats
) TO 'h3_4_stats.gpkg'
WITH (FORMAT GDAL,
DRIVER 'GPKG',
LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
Normally I would produce a Parquet file as even with 10s of thousands of records itβll generate in seconds versus a minute or so with GPKG. But ArcGIS Pro 3.5 didnβt want to open the Parquet file I generated. QGIS 3.44 was fine with it but I wanted to use Esriβs Nova basemap for the rendering below.
ArcGIS Pro 3.6 should be released sometime in the next few weeks so Iβll re-examine this issue when itβs out.
Below is the relationship between the sources of data and the installation year.
PIVOT 'arrays.parquet'
ON Source
USING COUNT(*)
GROUP BY instYr
ORDER BY instYr;
ββββββββββ¬ββββββββ¬ββββββββ¬ββββββββββββββββ¬ββββββββ¬ββββββββ¬βββββββββ
β instYr β CCVPV β CWSD β GMSEUSgeorect β OSM β SAM β USPVDB β
β int32 β int64 β int64 β int64 β int64 β int64 β int64 β
ββββββββββΌββββββββΌββββββββΌββββββββββββββββΌββββββββΌββββββββΌβββββββββ€
β 1985 β 0 β 0 β 0 β 0 β 0 β 1 β
β 1986 β 0 β 0 β 0 β 1 β 0 β 0 β
β 2002 β 0 β 0 β 0 β 0 β 0 β 1 β
β 2005 β 0 β 0 β 0 β 26 β 0 β 0 β
β 2006 β 0 β 0 β 0 β 2 β 0 β 1 β
β 2007 β 0 β 0 β 0 β 44 β 0 β 5 β
β 2008 β 0 β 0 β 0 β 58 β 1 β 11 β
β 2009 β 5 β 0 β 0 β 10 β 5 β 19 β
β 2010 β 20 β 0 β 0 β 71 β 20 β 37 β
β 2011 β 24 β 0 β 2 β 193 β 30 β 102 β
β 2012 β 59 β 0 β 2 β 267 β 88 β 157 β
β 2013 β 83 β 0 β 3 β 259 β 82 β 209 β
β 2014 β 102 β 0 β 1 β 335 β 119 β 291 β
β 2015 β 107 β 3 β 0 β 532 β 125 β 320 β
β 2016 β 145 β 1 β 2 β 564 β 170 β 412 β
β 2017 β 135 β 0 β 1 β 661 β 167 β 476 β
β 2018 β 66 β 34 β 4 β 644 β 210 β 414 β
β 2019 β 28 β 39 β 6 β 467 β 178 β 453 β
β 2020 β 10 β 75 β 1 β 437 β 186 β 496 β
β 2021 β 5 β 33 β 6 β 406 β 241 β 446 β
β 2022 β 1 β 173 β 3 β 231 β 354 β 166 β
β 2023 β 0 β 0 β 3 β 176 β 722 β 134 β
β 2024 β 0 β 0 β 0 β 31 β 1571 β 0 β
ββββββββββ΄ββββββββ΄ββββββββ΄ββββββββββββββββ΄ββββββββ΄ββββββββ΄βββββββββ€
β 23 rows 7 columns β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Below is the relationship between the mount and mod type.
PIVOT 'arrays.parquet'
ON modType
USING COUNT(*)
GROUP BY mount
ORDER BY mount;
βββββββββββββββ¬ββββββββ¬ββββββββ¬ββββββββββββ
β mount β c-si β csp β thin-film β
β varchar β int64 β int64 β int64 β
βββββββββββββββΌββββββββΌββββββββΌββββββββββββ€
β dual_axis β 301 β 18 β 1 β
β fixed_axis β 6057 β 32 β 208 β
β mixed β 2 β 0 β 0 β
β mixed_df β 189 β 7 β 0 β
β mixed_dfs β 94 β 0 β 0 β
β mixed_ds β 38 β 1 β 0 β
β mixed_fs β 60 β 0 β 1 β
β single_axis β 2876 β 11 β 231 β
β unknown β 4885 β 5 β 0 β
βββββββββββββββ΄ββββββββ΄ββββββββ΄ββββββββββββ
Solar Panels
The panels Parquet file has 2,917,782 rows. Below is an example record.
$ echo "SELECT * EXCLUDE(bbox,
geometry),
bbox::JSON bbox
FROM 'panels.parquet'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"Source": "gmseus",
"arrayID": 2807.0,
"bbox": {
"xmax": -79.97312295800064,
"xmin": -79.97325770533483,
"ymax": 32.87833627193374,
"ymin": 32.87830393275682
},
"panelID": 2620732,
"rowArea": 29.1,
"rowAzimuth": 174.62,
"rowLength": 12.77,
"rowMount": "fixed_axis",
"row
Copyright Β© 2014 - 2025 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.