American Solar Farms

Last week, Jake Stid, a Postdoctoral Research Associate at Michigan State University, announced Ground-Mounted Solar Energy in the United States (GM-SEUS). This is a 15K-array, 2.9M-panel dataset of utility and commercial-grade solar farms across 49 states in the US (lower 48 plus the District of Columbia). This dataset was constructed by a team of researchers including alumni from NOAA, NASA and the USGS.

Below is a heatmap of the assets catalogued in this dataset.

GM-SEUS is broken up into two datasets, one for arrays and another panels. Below you can see a solar farm with the array outlined in red and the panels covered purple.

In this post, I…

Below is a heatmap of the assets catalogued in this dataset.

GM-SEUS is broken up into two datasets, one for arrays and another panels. Below you can see a solar farm with the array outlined in red and the panels covered purple.

In this post, I’ll explore GM-SEUS’s Solar Farm dataset.

My Workstation

I’m using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.

The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system’s C drive.

The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.

I’m running Ubuntu 24 LTS via Microsoft’s Ubuntu for Windows on Windows 11 Pro. In case you’re wondering why I don’t run a Linux-based desktop as my primary work environment, I’m still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.

Installing Prerequisites

I’ll use GDAL 3.9.3 and a few other tools to help analyse the data in this post.

$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
gdal-bin \
jq

I’ll use DuckDB v1.4.1, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb

INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;

I’ll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc

.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;

The maps in this post were mostly rendered with QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.

I used QGIS’ Tile+ plugin to add basemaps from Esri to the maps in this post.

Analysis-Ready Data

I’ll download a dataset containing the US CENSUS State codes. This will let me map the state ID in the arrays dataset to their state name.

$ wget https://gist.github.com/a8dx/2340f9527af64f8ef8439366de981168/raw/81d876daea10eab5c2675811c39bcd18a79a9212/US_State_Bounding_Boxes.csv

I’ll download the ZIP file of deliverables for GM-SEUS.

$ wget -O GMSEUS_v1_0.zip \
'https://zenodo.org/records/14827819/files/GMSEUS_v1_0.zip?download=1'
$ unzip GMSEUS_v1_0.zip

I’ll extract the projection used. This proj4 string will be used to below to re-project the data into EPSG:4326.

$ gdalsrsinfo \
-o proj4 \
GMSEUS_v1_0/GPKG/GMSEUS_Arrays_Final.gpkg

+proj=aea +lat_0=37.5 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs

I’ll use DuckDB to clean up the values and produce both a geometry field and a bounding box for each feature in this dataset. This will make working with this dataset remotely, such as from AWS S3, much easier.

$ ~/duckdb

This following produced a ZStandard-compressed, spatially-sorted Parquet file of the arrays dataset. I dropped the Z dimension as it was unused. The unknown values have been turned into NULLs. The original GPKG file was 108 MB and the resulting Parquet file is 37 MB.

COPY (
WITH a AS (
SELECT * EXCLUDE(geom),
ST_FORCE2D(
ST_FLIPCOORDINATES(
ST_TRANSFORM(
geom,
'+proj=aea +lat_0=37.5 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs',
'EPSG:4326'))) geometry
FROM ST_READ('GMSEUS_v1_0/GPKG/GMSEUS_Arrays_Final.gpkg')
)
SELECT   a.* EXCLUDE (geometry,
tilt,
tiltEst,
instYr,
instYrLT,
effInit,
avgAzimuth,
avgLength,
avgSpace,
avgWidth,),
{'xmin': ST_XMIN(ST_EXTENT(geometry)),
'ymin': ST_YMIN(ST_EXTENT(geometry)),
'xmax': ST_XMAX(ST_EXTENT(geometry)),
'ymax': ST_YMAX(ST_EXTENT(geometry))} AS bbox,
ST_ASWKB(geometry) geometry,
CASE WHEN instYr::INT     = -9999 THEN NULL ELSE instYr::INT   END AS instYr,
CASE WHEN instYrLT::INT   = -9999 THEN NULL ELSE instYrLT::INT END AS instYrLT,
CASE WHEN numRow::INT     = -9999 THEN NULL ELSE numRow::INT   END AS numRow,
CASE WHEN tilt::INT       = -9999 THEN NULL ELSE tilt::INT     END AS tilt,
CASE WHEN tiltEst::INT    = -9999 THEN NULL ELSE tiltEst::INT  END AS tiltEst,
CASE WHEN effInit::INT    = -9999 THEN NULL ELSE effInit       END AS effInit,
CASE WHEN avgAzimuth::INT = -9999 THEN NULL ELSE avgAzimuth    END AS avgAzimuth,
CASE WHEN avgLength::INT  = -9999 THEN NULL ELSE avgLength     END AS avgLength,
CASE WHEN avgSpace::INT   = -9999 THEN NULL ELSE avgSpace      END AS avgSpace,
CASE WHEN avgWidth::INT   = -9999 THEN NULL ELSE avgWidth      END AS avgWidth,
b.NAME state_name
FROM     a
JOIN     'US_State_Bounding_Boxes.csv' b ON a.STATEFP = b.STATEFP
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO 'arrays.parquet' (
FORMAT            'PARQUET',
CODEC             'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE    15000);

The original GPKG file for the panels dataset was 1.1 GB and the resulting Parquet file is 334 MB.

COPY (
WITH a AS (
SELECT * EXCLUDE(geom),
ST_FORCE2D(
ST_FLIPCOORDINATES(
ST_TRANSFORM(
geom,
'+proj=aea +lat_0=37.5 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs',
'EPSG:4326'))) geometry
FROM ST_READ('GMSEUS_v1_0/GPKG/GMSEUS_Panels_Final.gpkg')
)
SELECT   * EXCLUDE (geometry,
rowSpace),
{'xmin': ST_XMIN(ST_EXTENT(geometry)),
'ymin': ST_YMIN(ST_EXTENT(geometry)),
'xmax': ST_XMAX(ST_EXTENT(geometry)),
'ymax': ST_YMAX(ST_EXTENT(geometry))} AS bbox,
ST_ASWKB(geometry) geometry,
CASE WHEN rowSpace::INT = -9999 THEN NULL ELSE rowSpace END AS rowSpace
FROM     a
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO 'panels.parquet' (
FORMAT            'PARQUET',
CODEC             'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE    15000);

Solar Arrays

The arrays Parquet file has 15,017 rows. Below is an example record.

$ echo "SELECT * EXCLUDE(bbox,
geometry),
bbox::JSON bbox
FROM 'arrays.parquet'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .

[
{
"COUNTYFP": "019",
"GCR1": 0.6996,
"GCR2": 0.614,
"STATEFP": "45",
"Source": "OSM",
"arrayID": 2807,
"avgAzimuth": 170.63,
"avgLength": 47.76166666666666,
"avgSpace": 3.003333333333333,
"avgWidth": 4.776666666666666,
"bbox": {
"xmax": -79.97229830431786,
"xmin": -79.97325770533094,
"ymax": 32.87833627192598,
"ymin": 32.87808294640646
},
"capMW": 0.246,
"capMWest": 0.246,
"effInit": 0.197963503102977,
"instYr": 2021,
"instYrLT": 2021,
"latitude": 32.87818725544087,
"longitude": -79.97276617375104,
"modType": "c-si",
"mount": "fixed_axis",
"nativeID": "9324",
"newBound": 1,
"numRow": 6.0,
"numRow_1": 6,
"state_name": "South Carolina",
"tilt": 30,
"tiltEst": 30,
"totArea": 1779.0,
"totRowArea": 1244.93,
"version": "v1.0"
}
]

Below are the field names, data types, percentages of NULLs per column, number of unique values and minimum and maximum values for each column.

$ ~/duckdb

SELECT   column_name,
column_type,
null_percentage,
approx_unique,
min,
max
FROM     (SUMMARIZE
FROM READ_PARQUET('arrays.parquet'))
WHERE    column_name != 'geometry'
AND      column_name != 'bbox'
ORDER BY 1;

┌─────────────┬─────────────┬─────────────────┬───────────────┬─────────────────────┬────────────────────┐
│ column_name │ column_type │ null_percentage │ approx_unique │         min         │        max         │
│   varchar   │   varchar   │  decimal(9,2)   │     int64     │       varchar       │      varchar       │
├─────────────┼─────────────┼─────────────────┼───────────────┼─────────────────────┼────────────────────┤
│ COUNTYFP    │ VARCHAR     │            0.00 │           235 │ 001                 │ 810                │
│ GCR1        │ DOUBLE      │            0.00 │          5057 │ 0.1047              │ 1.0                │
│ GCR2        │ DOUBLE      │            0.00 │          5013 │ 0.1245              │ 0.988              │
│ STATEFP     │ VARCHAR     │            0.00 │            49 │ 01                  │ 56                 │
│ Source      │ VARCHAR     │            0.00 │             6 │ CCVPV               │ USPVDB             │
│ arrayID     │ BIGINT      │            0.00 │         13155 │ 1                   │ 15017              │
│ avgAzimuth  │ DOUBLE      │           32.84 │          4295 │ 25.0                │ 269.27             │
│ avgLength   │ DOUBLE      │           39.79 │          8358 │ 4.02                │ 449.5004           │
│ avgSpace    │ DOUBLE      │           39.79 │          8623 │ 0.024               │ 20.0               │
│ avgWidth    │ DOUBLE      │           39.79 │          9185 │ 0.67625             │ 29.80222222222222  │
│ capMW       │ DOUBLE      │            0.00 │          5280 │ 0.001250225184651   │ 1051.703           │
│ capMWest    │ DOUBLE      │            0.00 │          7863 │ 0.004               │ 3170.1             │
│ effInit     │ DOUBLE      │            0.49 │            39 │ 0.132210289727273   │ 0.205484167047619  │
│ instYr      │ INTEGER     │            0.00 │            24 │ 1985                │ 2024               │
│ instYrLT    │ INTEGER     │            0.24 │            17 │ 2009                │ 2023               │
│ latitude    │ DOUBLE      │            0.00 │         16986 │ 25.53796582594631   │ 48.99547137225406  │
│ longitude   │ DOUBLE      │            0.00 │         15656 │ -124.10440474967092 │ -67.15066374183608 │
│ modType     │ VARCHAR     │            0.00 │             3 │ c-si                │ thin-film          │
│ mount       │ VARCHAR     │            0.00 │            10 │ dual_axis           │ unknown            │
│ nativeID    │ VARCHAR     │            0.00 │         15141 │ 1                   │ York Solar         │
│ newBound    │ BIGINT      │            0.00 │             2 │ 0                   │ 1                  │
│ numRow      │ DOUBLE      │            0.00 │          1461 │ 0.0                 │ 56782.0            │
│ numRow_1    │ INTEGER     │            0.00 │          1117 │ 0                   │ 56782              │
│ state_name  │ VARCHAR     │            0.00 │            57 │ Alabama             │ Wyoming            │
│ tilt        │ INTEGER     │           55.46 │            47 │ 0                   │ 83                 │
│ tiltEst     │ INTEGER     │           55.46 │            30 │ 10                  │ 43                 │
│ totArea     │ DOUBLE      │            0.00 │         13182 │ 54.0                │ 13735113.0         │
│ totRowArea  │ DOUBLE      │            0.00 │         15396 │ 44.97               │ 7223924.662        │
│ version     │ VARCHAR     │            0.00 │             1 │ v1.0                │ v1.0               │
├─────────────┴─────────────┴─────────────────┴───────────────┴─────────────────────┴────────────────────┤
│ 29 rows                                                                                      6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘

I’ll generate a heatmap of the asset locations in this dataset.

CREATE OR REPLACE TABLE h3_4_stats AS
SELECT   H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin, 4) AS h3_4,
COUNT(*) num_buildings
FROM     READ_PARQUET('arrays.parquet')
WHERE    bbox.xmin BETWEEN -178.5 AND 178.5
GROUP BY 1;

COPY (
SELECT ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_4)::geometry) geometry,
num_buildings
FROM   h3_4_stats
) TO 'h3_4_stats.gpkg'
WITH (FORMAT GDAL,
DRIVER 'GPKG',
LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

Normally I would produce a Parquet file as even with 10s of thousands of records it’ll generate in seconds versus a minute or so with GPKG. But ArcGIS Pro 3.5 didn’t want to open the Parquet file I generated. QGIS 3.44 was fine with it but I wanted to use Esri’s Nova basemap for the rendering below.

ArcGIS Pro 3.6 should be released sometime in the next few weeks so I’ll re-examine this issue when it’s out.

Below is the relationship between the sources of data and the installation year.

PIVOT    'arrays.parquet'
ON       Source
USING    COUNT(*)
GROUP BY instYr
ORDER BY instYr;

┌────────┬───────┬───────┬───────────────┬───────┬───────┬────────┐
│ instYr │ CCVPV │ CWSD  │ GMSEUSgeorect │  OSM  │  SAM  │ USPVDB │
│ int32  │ int64 │ int64 │     int64     │ int64 │ int64 │ int64  │
├────────┼───────┼───────┼───────────────┼───────┼───────┼────────┤
│   1985 │     0 │     0 │             0 │     0 │     0 │      1 │
│   1986 │     0 │     0 │             0 │     1 │     0 │      0 │
│   2002 │     0 │     0 │             0 │     0 │     0 │      1 │
│   2005 │     0 │     0 │             0 │    26 │     0 │      0 │
│   2006 │     0 │     0 │             0 │     2 │     0 │      1 │
│   2007 │     0 │     0 │             0 │    44 │     0 │      5 │
│   2008 │     0 │     0 │             0 │    58 │     1 │     11 │
│   2009 │     5 │     0 │             0 │    10 │     5 │     19 │
│   2010 │    20 │     0 │             0 │    71 │    20 │     37 │
│   2011 │    24 │     0 │             2 │   193 │    30 │    102 │
│   2012 │    59 │     0 │             2 │   267 │    88 │    157 │
│   2013 │    83 │     0 │             3 │   259 │    82 │    209 │
│   2014 │   102 │     0 │             1 │   335 │   119 │    291 │
│   2015 │   107 │     3 │             0 │   532 │   125 │    320 │
│   2016 │   145 │     1 │             2 │   564 │   170 │    412 │
│   2017 │   135 │     0 │             1 │   661 │   167 │    476 │
│   2018 │    66 │    34 │             4 │   644 │   210 │    414 │
│   2019 │    28 │    39 │             6 │   467 │   178 │    453 │
│   2020 │    10 │    75 │             1 │   437 │   186 │    496 │
│   2021 │     5 │    33 │             6 │   406 │   241 │    446 │
│   2022 │     1 │   173 │             3 │   231 │   354 │    166 │
│   2023 │     0 │     0 │             3 │   176 │   722 │    134 │
│   2024 │     0 │     0 │             0 │    31 │  1571 │      0 │
├────────┴───────┴───────┴───────────────┴───────┴───────┴────────┤
│ 23 rows                                               7 columns │
└─────────────────────────────────────────────────────────────────┘

Below is the relationship between the mount and mod type.

PIVOT    'arrays.parquet'
ON       modType
USING    COUNT(*)
GROUP BY mount
ORDER BY mount;

┌─────────────┬───────┬───────┬───────────┐
│    mount    │ c-si  │  csp  │ thin-film │
│   varchar   │ int64 │ int64 │   int64   │
├─────────────┼───────┼───────┼───────────┤
│ dual_axis   │   301 │    18 │         1 │
│ fixed_axis  │  6057 │    32 │       208 │
│ mixed       │     2 │     0 │         0 │
│ mixed_df    │   189 │     7 │         0 │
│ mixed_dfs   │    94 │     0 │         0 │
│ mixed_ds    │    38 │     1 │         0 │
│ mixed_fs    │    60 │     0 │         1 │
│ single_axis │  2876 │    11 │       231 │
│ unknown     │  4885 │     5 │         0 │
└─────────────┴───────┴───────┴───────────┘

Solar Panels

The panels Parquet file has 2,917,782 rows. Below is an example record.

$ echo "SELECT * EXCLUDE(bbox,
geometry),
bbox::JSON bbox
FROM 'panels.parquet'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .

[
{
"Source": "gmseus",
"arrayID": 2807.0,
"bbox": {
"xmax": -79.97312295800064,
"xmin": -79.97325770533483,
"ymax": 32.87833627193374,
"ymin": 32.87830393275682
},
"panelID": 2620732,
"rowArea": 29.1,
"rowAzimuth": 174.62,
"rowLength": 12.77,
"rowMount": "fixed_axis",
"row

Copyright © 2014 - 2025 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.

My Workstation

Installing Prerequisites

Analysis-Ready Data

Solar Arrays

Solar Panels

Similar Posts