In May, the US Geological Survey (USGS) published version 8.1 of the U.S. Wind Turbine Database (USWTB). The first version was published back in 2018 and it documents the location 76K+ wind turbines, their capabilities and associated project data across the US and its territories.
Below is a heatmap of their locations.
In this post, Iβll walk through converting this dataset into Parquet and examining some of its features.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5tβ¦
In May, the US Geological Survey (USGS) published version 8.1 of the U.S. Wind Turbine Database (USWTB). The first version was published back in 2018 and it documents the location 76K+ wind turbines, their capabilities and associated project data across the US and its territories.
Below is a heatmap of their locations.
In this post, Iβll walk through converting this dataset into Parquet and examining some of its features.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my systemβs C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
Iβm running Ubuntu 24 LTS via Microsoftβs Ubuntu for Windows on Windows 11 Pro. In case youβre wondering why I donβt run a Linux-based desktop as my primary work environment, Iβm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
Iβll use GDAL 3.9.3 and a few other tools to help analyse the data in this post.
$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
gdal-bin \
jq
Iβll use DuckDB, along with its H3, JSON, Lindel, Parquet and Spatial extensions in this post.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
Iβll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
Most of the maps in this post were rendered with QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGISβ Tile+ plugin to add basemaps from NASAβs Blue Marble and Bing for spatial context.
I also used the following GeoJSON files for counties and states.
Downloading the Dataset
Iβll download the dataset and convert it into Parquet.
The original field names contained a lot of acronyms and truncated names. Iβve renamed and restructured these for easier human consumption. Iβve taken all the values representing missing values and converted them into NULLs. Iβve also converted any magic numbers into English strings.
$ mkdir -p ~/usgs_wind
$ cd ~/usgs_wind
$ wget https://energy.usgs.gov/uswtdb/assets/data/uswtdbSHP.zip
$ unzip uswtdbSHP.zip
$ ~/duckdb
COPY (
SELECT
case_id: case_id::INT,
eia_id: IF(eia_id::TEXT='-9999', NULL, eia_id::INT),
faa: {
asn: IF(faa_asn='missing', NULL, faa_asn),
ors: IF(faa_ors='missing', NULL, faa_ors),
},
project: {
avg_turbine_capacity: IF(p_cap::TEXT='-9999.0', NULL, p_cap::DOUBLE),
name: p_name,
num_turbines: p_tnum::INT,
online_year: IF(p_year::TEXT='-9999', NULL, p_year::INT),
},
geometry: geom,
bbox: {'xmin': ST_XMIN(ST_EXTENT(geom)),
'ymin': ST_YMIN(ST_EXTENT(geom)),
'xmax': ST_XMAX(ST_EXTENT(geom)),
'ymax': ST_YMAX(ST_EXTENT(geom))},
location: {
country: t_county,
fips: IF(t_fips::VARCHAR='NA', NULL, t_fips::VARCHAR),
state: t_state,
},
turbine: {
capacity: IF(t_cap::TEXT='-9999', NULL, t_cap::DOUBLE),
attribute_confidence:
CASE WHEN t_conf_atr = 1 THEN 'Low'
WHEN t_conf_atr = 2 THEN 'Medium'
WHEN t_conf_atr = 3 THEN 'High'
END,
location_confidence:
CASE WHEN t_conf_loc = 1 THEN 'Low'
WHEN t_conf_loc = 2 THEN 'Medium'
WHEN t_conf_loc = 3 THEN 'High'
END,
hub_height: IF(t_hh::TEXT='-9999.0', NULL, t_hh::DOUBLE),
imagery: {
captured_at: t_img_date,
provider: t_img_src,
},
manufacturer: IF(t_manu='missing', NULL, t_manu),
model: IF(t_model='missing', NULL, t_model),
is_offshore: t_offshore=1,
rated_capacity_mw: IF(t_rd::TEXT='-9999.0', NULL, t_rd::DOUBLE),
retrofitted_year: IF(t_retro_yr::TEXT='-9999', NULL, t_retro_yr::INT),
is_retrofitted: t_retrofit=1,
rsa: IF(t_rsa::TEXT='-9999.0', NULL, t_rsa::DOUBLE),
ttlh: IF(t_ttlh::TEXT='-9999.0', NULL, t_ttlh::DOUBLE),
},
usgs_pr_id: IF(usgs_pr_id::TEXT='-9999', NULL, usgs_pr_id::INT),
FROM ST_READ('uswtdb_V8_1_20250522.shx')
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geom)),
ST_X(ST_CENTROID(geom))]::double[2])
) TO 'uswtdb_V8_1_20250522.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Data Fluency
The following is an example record from this dataset.
$ echo "SELECT * EXCLUDE(bbox, faa, project, location, turbine),
bbox::JSON AS bbox,
faa::JSON AS faa,
project::JSON AS project,
location::JSON AS location,
turbine::JSON AS turbine
FROM 'uswtdb_V8_1_20250522.parquet'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"bbox": {
"xmax": 144.72265600000003,
"xmin": 144.72265600000003,
"ymax": 13.389381000000071,
"ymin": 13.389381000000071
},
"case_id": 3063607,
"eia_id": null,
"faa": {
"asn": "2013-WTW-2712-OE",
"ors": null
},
"geometry": "POINT (144.72265600000003 13.389381000000071)",
"location": {
"country": "Guam",
"fips": "66010",
"state": "GU"
},
"project": {
"avg_turbine_capacity": 0.275,
"name": "Guam Power Authority Wind Turbine",
"num_turbines": 1,
"online_year": 2016
},
"turbine": {
"attribute_confidence": "High",
"capacity": 275.0,
"hub_height": 55.0,
"imagery": {
"captured_at": "2017-08-10",
"provider": "Maxar"
},
"is_offshore": false,
"is_retrofitted": false,
"location_confidence": "High",
"manufacturer": "Vergnet",
"model": "GEV MP-C",
"rated_capacity_mw": 32.0,
"retrofitted_year": nullOR REPLACE TABLE top_manufacturers AS
SELECT num_turbines: COUNT(*),
manufacturer: turbine.manufacturer
FROM 'uswtdb_V8_1_20250522.parquet'
WHERE turbine.manufacturer IS NOT NULL
GROUP BY 2
ORDER BY 1 DESC
LIMIT 10;
CREATE OR REPLACE TABLE h3_3s AS
WITH b AS (
WITH a AS (
SELECT H3_LATLNG_TO_CELL(bbox.ymin,
bbox.xmin,
3) h3_3,
manufacturer: turbine.manufacturer,
COUNT(*) num_recs
FROM 'uswtdb_V8_1_20250522.parquet'
WHERE turbine.manufacturer IN (
SELECT DISTINCT manufacturer
FROM top_manufacturers
)
GROUP BY 1, 2
)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY h3_3
ORDER BY num_recs DESC) AS rn
FROM a
)
FROM b
WHERE rn = 1
ORDER BY num_recs DESC;
COPY (
SELECT geom: H3_CELL_TO_BOUNDARY_WKT(h3_3)::GEOMETRY,
manufacturer
FROM h3_3s
WHERE ST_XMIN(geom::GEOMETRY) BETWEEN -179 AND 179
AND ST_XMAX(geom::GEOMETRY) BETWEEN -179 AND 179
) TO 'turbine.top_manufacturer.h3_3.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Only 67 turbines in this dataset are listed as being offshore.
SELECT COUNT(*),
turbine.is_offshore
FROM 'uswtdb_V8_1_20250522.parquet'
GROUP BY 2;
ββββββββββββββββ¬ββββββββββββββ
β count_star() β is_offshore β
β int64 β boolean β
ββββββββββββββββΌββββββββββββββ€
β 75984 β false β
β 67 β true β
ββββββββββββββββ΄ββββββββββββββ
Below are the top 20 states by their total GW of installed capacity.
SELECT location.state,
gw: ROUND(SUM(turbine.rated_capacity_mw) / 1000)::INT
FROM 'uswtdb_V8_1_20250522.parquet'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20;
βββββββββββ¬ββββββββ
β state β gw β
β varchar β int32 β
βββββββββββΌββββββββ€
β TX β 1998 β
β IA β 653 β
β OK β 591 β
β KS β 439 β
β IL β 370 β
β CO β 284 β
β CA β 258 β
β MN β 246 β
β ND β 215 β
β NM β 208 β
β OR β 193 β
β MI β 186 β
β IN β 169 β
β NE β 162 β
β SD β 162 β
β WY β 159 β
β WA β 157 β
β NY β 124 β
β MO β 123 β
β MT β 92 β
βββββββββββ΄ββββββββ€
β 20 rows β
βββββββββββββββββββ
Of the 76K turbines in this dataset, 67K have a high confidence rating for both their location and attributes.
WITH a AS (
SELECT attribute: turbine.attribute_confidence,
location: turbine.location_confidence,
num_turbines: COUNT(*)
FROM 'uswtdb_V8_1_20250522.parquet'
GROUP BY 1, 2
ORDER BY 3 DESC
)
PIVOT a
ON attribute,
USING SUM(num_turbines)
GROUP BY location
ORDER BY location;
ββββββββββββ¬βββββββββ¬βββββββββ¬βββββββββ
β location β High β Low β Medium β
β varchar β int128 β int128 β int128 β
ββββββββββββΌβββββββββΌβββββββββΌβββββββββ€
β High β 67060 β 2924 β 4472 β
β Low β 137 β 1256 β 141 β
β Medium β 1 β 60 β NULL β
ββββββββββββ΄βββββββββ΄βββββββββ΄βββββββββ
Project Footprints
Each turbine record contains denormalised project information. Iβll group the projects by FIPS and name to generate their geographical footprints.
Below Iβll generate the footprints for each project. Iβve added in the FIPS code to avoid name overlaps creating country-wide polygons.
$ ~/duckdb
COPY (
SELECT fips: location.fips,
project_name: project.name,
geometry: {
'min_x': MIN(ST_X(geometry)),
'min_y': MIN(ST_Y(geometry)),
'max_x': MAX(ST_X(geometry)),
'max_y': MAX(ST_Y(geometry))}::BOX_2D::GEOMETRY
FROM READ_PARQUET('uswtdb_V8_1_20250522.parquet')
GROUP BY 1, 2
) TO 'projects.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Below are a few projects in Texas.
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If youβd like to discuss how my offerings can help your business please contact me via LinkedIn.