In September, Business Insider published a video on the locations, ownership and power and water consumption of Americaβs Data Centers. They collected diesel generator permits from every state in the US and mapped the applicantβs company name back to their parent company. This revealed 1,240 sites along with a large amount of metadata for each location.
They published an interactive map with the underlying GeoJSON dataset embedded as JavaScript.
In this post, Iβll explore the Business Insiderβs dataset.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has β¦
In September, Business Insider published a video on the locations, ownership and power and water consumption of Americaβs Data Centers. They collected diesel generator permits from every state in the US and mapped the applicantβs company name back to their parent company. This revealed 1,240 sites along with a large amount of metadata for each location.
They published an interactive map with the underlying GeoJSON dataset embedded as JavaScript.
In this post, Iβll explore the Business Insiderβs dataset.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my systemβs C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
Iβm running Ubuntu 24 LTS via Microsoftβs Ubuntu for Windows on Windows 11 Pro. In case youβre wondering why I donβt run a Linux-based desktop as my primary work environment, Iβm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
Iβll use Python 3.12.3 and a few other tools to help analyse the data in this post.
$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt update
$ sudo apt install \
jq \
python3-pip \
python3.12-venv
Iβll set up a Python Virtual Environment and a JavaScript interpreter for Python.
$ python3 -m venv ~/.dcs
$ source ~/.dcs/bin/activate
$ python -m pip install \
duckdb \
esprima \
levenshtein
Iβll use DuckDB v1.4.1, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
Iβll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
The maps in this post were rendered using QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGISβ Tile+ plugin to add basemaps from Esri and Bing to the maps in this post.
Analysis-Ready Data
Iβll download the JavaScript used for Business Insiderβs interactive map.
$ mkdir -p ~/business_insider
$ cd ~/business_insider
$ wget https://tbimedia.s3.us-east-1.amazonaws.com/bistudios/_00/dev_edit/graphics/2025/09/2025-09-datacenters-map-table/index.js
Iβll use Python to convert the dataset into line-delimited JSON.
$ python3
import json
import esprima
resp = esprima.parseScript(
open('index.js', 'r')
.read()
.split('features:Q0},')[-1]
.split(',I9=[];')[0])
def get_key(prop):
if prop.key.value:
return prop.key.value.lower()
if prop.key.name:
return prop.key.name.lower()
return 'unknown'
dataset = [
{get_key(prop): prop.value.value
for prop in element.properties}
for element in resp.body[0].expression.right.elements]
with open('data_centres.json', 'w') as f:
for rec in dataset:
f.write(json.dumps(rec, sort_keys=True) + '\n')
Iβll use DuckDB to convert the JSON into a Parquet file. This will make analysis quicker than any of the above formats.
$ ~/duckdb
COPY (
SELECT * EXCLUDE(long, lat),
ST_POINT(long::FLOAT, lat::FLOAT) geometry
FROM 'data_centres.json'
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO 'data_centres.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
The above produced a 234 KB, 1,240-row, 99-column Parquet file.
Data Fluency
Below is an example record from this dataset. Itβs of a hyperscaler-level site owned by Apple in Arizona.
$ echo "FROM 'data_centres.parquet'
WHERE address LIKE '%Mesa AZ 85212'
AND brand = 'Apple'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"# facilities note": "",
"# of tier 2": "51",
"# of tier 3": "",
"# of tier 4": "",
"address": "3740 S Signal Butte Rd, Mesa AZ 85212",
"annual water consumption (gallons)": "-",
"aquifer name": "-",
"avert region": "Southwest",
"brand": "Apple",
"brand source": "https://www.sec.gov/Archives/edgar/data/1394954/000144530514004770/a8-kapplesettlementagreeme.htm",
"case status desc": "-",
"city": "Mesa",
"co tpy": "-",
"co2e tpy": "-",
"company": "Platypus Development LLC",
"county": "Maricopa",
"daily water consumption (gallons)": "-",
"data center construct year": "",
"estimate power consumption in kw/hr (calculated 30%)": "55864",
"estimate power consumption in kw/hr (calculated 50%)": "93106",
"estimate power consumption in kw/hr (calculated 60%)": "111,727",
"estimated at 2n": "46553",
"estimated at 2n (in twh)": "0.408",
"estimated at n +1(in twh)": "0.652",
"estimated at n+1": "74484.8",
"estimated data center electricity use in megawatt-hours at 50% capacity": "93.11",
"estimated data center electricity use in terrawatt-hours a year at 50% capacity": "0.82",
"first permit issue year": "2020",
"generator type": "Caterpillar",
"geometry": "POINT (-111.60407257080078 33.34719467163086)",
"large facilites": "Possible hyperscaler",
"latest permit issue year": "2022",
"link to records": "Apple - Platypus Development LLC.docx",
"major basin name": "North America, Colorado",
"minor basin name": "Middle Gila",
"new permit or existing update?": "Existing update",
"nox tpy": "90",
"number of buildings": "",
"number of epa enforcements": "-",
"page": "5",
"pm tpy": "-",
"pm10 tpy": "-",
"pm2.5 tpy": "-",
"population within 1-mile radius - 10,000": "fewer than 10,000",
"population within 1-mile radius - 5,000": "More than 5,000",
"primary law": "-",
"private equity or asset manager?": "",
"private equity or asset manager? (source)": "",
"rate capacity original": "249715 HP",
"reporter": "Narimes",
"reporter notes": "eight emergency generators rated at 3604 HP, two rated at 2206 HP, one rated at 762 HP, one rated at 1141 HP, fifty seven rated at 5646 HP",
"size capacity over 100 mw": "-",
"size category at 30%": "Possible hyperscaler",
"size category at 50% capacity": "Possible hyperscaler",
"size category at 50% capacity 'large' vs 'small'": "large-scale",
"size category at 60%": "Possible hyperscaler",
"sox tpy": "-",
"state": "AZ",
"state environmental justice concern": "no",
"state percentile for ej index for diesel particulate matter": "32",
"state percentile for ej index for drinking water non-compliance": "68",
"state percentile for ej index for hazardous waste proximity": "39",
"state percentile for ej index for lead paint indicator": "48",
"state percentile for ej index for nitrogen dioxide (no2)": "27",
"state percentile for ej index for ozone": "52",
"state percentile for ej index for particulate matter": "35",
"state percentile for ej index for rmp proximity": "44",
"state percentile for ej index for superfund proximity": "68",
"state percentile for ej index for toxic releases to air": "55",
"state percentile for ej index for traffic proximity and volume": "33",
"state percentile for ej index for underground storage tanks (ust) indicator": "36",
"state percentile for ej index for wastewater discharge indicator": "32",
"statename": "Arizona",
"total generator rate capacity kw": "186212",
"total penalties assessed": "-",
"total population within 1 mile of site": "9,026",
"unique id": "37",
"unknown": "60518.9",
"us environmental justice concern": "no",
"us percentile for ej index for diesel particulate matter": "45",
"us percentile for ej index for drinking water non-compliance": "79",
"us percentile for ej index for hazardous waste proximity": "33",
"us percentile for ej index for lead paint indicator": "15",
"us percentile for ej index for nitrogen dioxide (no2)": "47",
"us percentile for ej index for ozone": "62",
"us percentile for ej index for particulate matter": "25",
"us percentile for ej index for rmp proximity": "41",
"us percentile for ej index for superfund proximity": "66",
"us percentile for ej index for toxic releases to air": "51",
"us percentile for ej index for traffic proximity and volume": "42",
"us percentile for ej index for underground storage tanks (ust) indicator": "38",
"us percentile for ej index for wastewater discharge indicator": "50",
"vocs tpy": "-",
"water notes": "-",
"water record link": "-",
"water requested?": "denied",
"water stress": "Extremely High (>80%)",
"zip": "85212"
}
]
Below are the field names, data types, percentages of NULLs per column, number of unique values and minimum and maximum values for each column.
$ ~/duckdb
.maxrows 500
SELECT column_name,
column_type,
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'data_centres.parquet')
ORDER BY 1;
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββ¬βββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β column_name β column_type β null_percentage β approx_unique β min[:30] β max[:30] β
β varchar β varchar β decimal(9,2) β int64 β varchar β varchar β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββΌββββββββββββββββββΌββββββββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ€
β # facilities note β VARCHAR β 0.00 β 9 β β 1 of 8 facilities β
β # of tier 2 β VARCHAR β 0.00 β 12 β β 8 β
β # of tier 3 β VARCHAR β 0.00 β 2 β β 1 β
β # of tier 4 β VARCHAR β 0.00 β 2 β β 414 β
β address β VARCHAR β 0.40 β 1360 β 0 Williams Road, Palmetto, Geo β intersection of Morse Rd and B β
β annual water consumption (gallons) β VARCHAR β 0.00 β 67 β β aggregate β
β aquifer name β VARCHAR β 0.00 β 6 β - β Northern Great Plains / Interi β
β avert region β VARCHAR β 0.00 β 14 β California β Texas β
β brand β VARCHAR β 0.00 β 345 β 11:11 Systems β unWired Broadband β
β brand source β VARCHAR β 0.24 β 620 β β zColo = Databank β
β case status desc β VARCHAR β 0.00 β 5 β β Resolved β
β city β VARCHAR β 0.00 β 376 β β Wood Dale β
β co tpy β VARCHAR β 0.00 β 604 β β 99.9 β
β co2e tpy β VARCHAR β 0.00 β 38 β β 9772.31 β
β company β VARCHAR β 0.08 β 742 β 1000 Coit Road, L.P. β zColo, LLC β
β county β VARCHAR β 0.00 β 198 β Accomack β Yolo β
β daily water consumption (gallons) β VARCHAR β 0.00 β 10 β β aggregate water β
β data center construct year β VARCHAR β 0.00 β 5 β β in progress β
β estimate power consumption in kw/hr (calculated 30%) β VARCHAR β 0.00 β 630 β 100 β redacted β
β estimate power consumption in kw/hr (calculated 50%) β VARCHAR β 0.00 β 801 β 100 β redacted β
β estimate power consumption in kw/hr (calculated 60%) β VARCHAR β 0.00 β 778 β 1,048 β redacted β
β estimated at 2n β VARCHAR β 0.00 β 930 β 100 β redacted β
β estimated at 2n (in twh) β VARCHAR β 0.00 β 335 β 0 β redacted β
β estimated at n +1(in twh) β VARCHAR β 0.00 β 388 β 0 β redacted β
β estimated at n+1 β VARCHAR β 0.00 β 881 β 100 β redacted β
β estimated data center electricity use in megawatt-hours at 50% capacity β VARCHAR β 0.00 β 820 β 0.02 β redacted β
β estimated data center electricity use in terrawatt-hours a year at 50% capacity β VARCHAR β 0.00 β 130 β 0 β redacted β
β first permit issue year β VARCHAR β 0.00 β 38 β 1976 β 2024 β
β generator type β VARCHAR β 0.08 β 134 β β Waukesha; Doosan β
β geometry β GEOMETRY β 0.00 β 1165 β POINT (-112.328125 33.57888793 β POINT (-121.95311737060547 37. β
β large facilites β VARCHAR β 0.00 β 5 β Possible hyperscaler β redacted β
β latest permit issue year β VARCHAR β 0.00 β 34 β 1977 β 2025 β
β link to records β VARCHAR β 0.00 β 876 β β wsdc0101.pdf wsdc0102.pdf wsdc β
β major basin name β VARCHAR β 0.00 β 11 β β United States, North Atlantic β
β minor basin name β VARCHAR β 0.00 β 184 β β Wheeler Lake β
β new permit or existing update? β VARCHAR β 0.00 β 2 β Existing update β New β
β nox tpy β VARCHAR β 0.00 β 712 β - β 99.97 β
β number of buildings β VARCHAR β 0.00 β 10 β β 8 β
β number of epa enforcements β VARCHAR β 0.00 β 5 β β 3 β
β page β VARCHAR β 0.00 β 209 β β rows 9481-9484 β
β pm tpy β VARCHAR β 0.00 β 280 β β 9.9 β
β pm10 tpy β VARCHAR β 0.00 β 345 β β 9.97 β
β pm2.5 tpy β VARCHAR β 0.00 β 344 β β 9.9 β
β population within 1-mile radius - 10,000 β VARCHAR β 0.00 β 3 β - β fewer than 10,000 β
β population within 1-mile radius - 5,000 β VARCHAR β 0.00 β 3 β β fewer than 5,000 β
β primary law β VARCHAR β 0.00 β 4 β β CWA β
β private equity or asset manager? β VARCHAR β 0.00 β 20 β β pg 1 PI32426_PCP210001.pdf β
β private equity or asset manager? (source) β VARCHAR β 0.00 β 29 β β https://www.streamdatacenters. β
β rate capacity original β VARCHAR β 0.00 β 453 β β [combined] β
β reporter β VARCHAR β 0.00 β 12 β β Yuheng/Rosemarie β
β reporter notes β VARCHAR β 16.69 β 1142 β β two operating scenarios: 1) fo β
β size capacity over 100 mw β VARCHAR β 0.00 β 3 β β over 100 MW β
β size category at 30% β VARCHAR β 0.00 β 5 β Possible hyperscaler β redacted β
β size category at 50% capacity β VARCHAR β 0.00 β 5 β Possible hyperscaler β redacted β
β size category at 50% capacity 'large' vs 'small' β VARCHAR β 0.00 β 5 β large-scale β small-scale β
β size category at 60% β VARCHAR β 0.00 β 5 β Possible hyperscaler β redacted β
β sox tpy β VARCHAR β 0.00 β 243 β β 9.907 β
β state β VARCHAR β 0.00 β 50 β AL β WY β
β state environmental justice concern β VARCHAR β 0.00 β 4 β β yes β
β state percentile for ej index for diesel particulate matter β VARCHAR β 0.00 β 85 β β 99 β
β state percentile for ej index for drinking water non-compliance β VARCHAR β 0.00 β 41 β β 99 β
β state percentile for ej index for hazardous waste proximity β VARCHAR β 0.00 β 84 β β 99 β
β state percentile for ej index for lead paint indicator β VARCHAR β 0.00 β 92 β β 98 β
β state percentile for ej index for nitrogen dioxide (no2) β VARCHAR β 0.00 β 88 β β 98 β
β state percentile for ej index for ozone β VARCHAR β 0.00 β 92 β β 99 β
β state percentile for ej index for particulate matter β VARCHAR β 0.00 β 90 β β 99 β
β state percentile for ej index for rmp proximity β VARCHAR β 0.00 β 84 β β 99 β
β state percentile for ej index for superfund proximity β VARCHAR β 0.00 β 59 β β 99 β
β state percentile for ej index for toxic releases to air β VARCHAR β 0.00 β 92 β β 98 β
β state percentile for ej index for traffic proximity and volume β VARCHAR β 0.00 β 92 β β 99 β
β state percentile for ej index for underground storage tanks (ust) indicator β VARCHAR β 0.00 β 81 β β 98 β
β state percentile for ej index for wastewater discharge indicator β VARCHAR β 0.00 β 92 β β 99 β
β statename β VARCHAR β 0.00 β 49 β Alabama β Wyoming β
β total generator rate capacity kw β VARCHAR β 0.00 β 825 β 100 β redacted β
β total penalties assessed β VARCHAR β 0.00 β 38 β β - β
β total population within 1 mile of site β VARCHAR β 0.00 β 1166 β - β 994 β
β unique id β VARCHAR β 0.00 β 1103 β 10 β 999 β
β unknown β VARCHAR β 0.00 β 923 β 1006.53 β redacted β
β us environmental justice concern β VARCHAR β 0.00 β 4 β β yes β
β us percentile for ej index for diesel particulate matter β VARCHAR β 0.00 β 92 β β 99 β
β us percentile for ej index for drinking water non-compliance β VARCHAR β 0.00 β 27 β β 99 β
β us percentile for ej index for hazardous waste proximity β VARCHAR β 0.00 β 78 β β 99 β
β us percentile for ej index for lead paint indicator β VARCHAR β 0.00 β 78 β β 97 β
β us percentile for ej index for nitrogen dioxide (no2) β VARCHAR β 0.00 β 92 β β 99 β
β us percentile for ej index for ozone β VARCHAR β 0.00 β 92 β β 99 β
β us percentile for ej index for particulate matter β VARCHAR β 0.00 β 88 β - β 99 β
β us percentile for ej index for rmp proximity β VARCHAR β 0.00 β 69 β β 99 β
β us percentile for ej index for superfund proximity β VARCHAR β 0.00 β 39 β β 98 β
β us percentile for ej index for toxic releases to air β VARCHAR β 0.00 β 92 β β 99 β
β us percentile for ej index for traffic proximity and volume β VARCHAR β 0.00 β 91 β β 99 β
β us percentile for ej index for underground storage tanks (ust) indicator β VARCHAR β 0.00 β 69 β β 99 β
β us percentile for ej index for wastewater discharge indicator β VARCHAR β 0.00 β 91 β β 98 β
β vocs tpy β VARCHAR β 0.00 β 382 β β 9.94 β
β water notes β VARCHAR β 0.08 β 51 β β usage is from Jan 2023-Dec 202 β
β water record link β VARCHAR β 0.00 β 47 β β https://www.denverpost.com/202 β
β water requested? β VARCHAR β 0.00 β 7 β β yes β
β water stress β VARCHAR β 0.00 β 6 β Arid and Low Water Use β Medium - High (20-40%) β
β zip β VARCHAR β 0.00 β 552 β - β 99019 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββ΄βββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ€
β 98 rows 6 columns β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
A lot could be done to group these fields into dictionaries and normalise their values. If this post proves popular, I might revisit this.
Diesel Generator Permits
Virginia has published a list of 188 diesel generator permits theyβve issued to data centers in their state.
In some cases, the well-known trading name of the applicant will be stated. The following is a quote from a permit issued to AWS.
Attached is a permit to construct and operate emergency diesel engine generator sets (gen-sets) at Amazon Data Servicesβ data centers (IAD-51, IAD-56, IAD-88, IAD-89, IAD-192, and IAD214), in accordance with the provisions of the Commonwealth of Virginia State Air Pollution Control Board (Boardβs) Regulations for the Control and Abatement of Air Pollution (Regulations). This permit document combines the terms and conditions from, and supersedes your permit document dated October 12, 2023.
Below is a manifest of diesel generators theyβre using at their site.
Not all states make permits accessible via google searches. If anyone is looking to continue this research, the BBC listed the top jurisdictions and their market share in and outside of the US where data centers are located. This can help narrow the search space.
14% Northern Virginia
6% Oregon
4% Iowa
3% Dallas, Texas
2% Arizona
2% Nebraska
2% Illinois
6% Beijing
4% Dublin
2% Shanghai
Data Center Owners
Below are the most and least common brands represented in this dataset.
$ ~/duckdb
SELECT COUNT(*),
brand
FROM 'data_centres.parquet'
GROUP BY 2
ORDER BY 1 DESC;
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β count_star() β brand β
β int64 β varchar β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 177 β Amazon β
β 70 β Digital Realty β
β 53 β Equinix β
β 47 β Google β
β 44 β Microsoft β
β 34 β QTS β
β 33 β DataBank β
β 33 β Centersquare β
β 32 β Meta β
β 31 β CyrusOne β
β 31 β Lumen Technologies β
β 23 β Verizon β
β 19 β TierPoint β
β 18 β Flexential β
β 17 β CoreSite β
β 16 β Iron Mountain β
β 15 β NTT β
β 14 β Stack Infrastructure β
β 13 β Cogent β
β 13 β EdgeConneX β
β Β· β Β· β
β Β· β Β· β
β Β· β Β· β
β 1 β SOMO village β
β 1 β Datacate β
β 1 β PC Solutions β
β 1 β Showers Development, LLC β
β 1 β US Signal β
β 1 β Reliance Industries β
β 1 β Inova β
β 1 β George Washington University β
β 1 β The Vanguard Group β
β 1 β University of Minnesota - Minnesota Supercomputing Institute β
β 1 β WW Grainger β
β 1 β MasterCard β
β 1 β PSCU Financial Services β
β 1 β SRI Ten 706 Wilshire LLC β
β 1 β Rosegate LLC β
β 1 β Stanford University Data Center β
β 1 β California Legislative Counsel β
β 1 β Adobe β
β 1 β American Honda Motor Company β
β 1 β NeuStar β
ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 305 rows (40 shown) 2 columns β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
These are the number of hyperscaler sites by brand. Itβs interesting to see Apple only has two locations.
SELECT brand,
COUNT(*)
FROM 'data_centres.parquet'
WHERE "large facilites" = 'Possible hyperscaler'
GROUP BY 1
ORDER BY 2 DESC;
βββββββββββββββββββββββββββββββ¬βββββββββββββββ
β brand β count_star() β
β varchar β int64 β
βββββββββββββββββββββββββββββββΌβββββββββββββββ€
β Amazon β 45 β
β Microsoft β 21 β
β Google β 12 β
β QTS β 12 β
β Aligned Data Centers β 9 β
β Digital Realty β 9 β
β Meta β 8 β
β CyrusOne β 6 β
β Vantage Data Centers β 5 β
β NTT β 5 β
β CloudHQ β 4 β
β Stack Infrastructure β 4 β
β Compass Datacenters β 3 β
β Apple β 2 β
β Yondr β 1 β
β Stream Data Centers β 1 β
β Edged Energy β 1 β
β Sabey Data Centers β 1 β
β Corscale Data Centers β 1 β
β Iron Mountain β 1 β
β CoreSite β 1 β
β Skybox β 1 β
β Cologix β 1 β
β Equinix β 1 β
β US National Security Agency β 1 β
βββββββββββββββββββββββββββββββ΄βββββββββββββββ€
β 25 rows 2 columns β
ββββββββββββββββββββββββββββββββββββββββββββββ
Locations Heatmap
Below is a heatmap of the data center locations. The brighter hexagons have more sites.
$ ~/duckdb
CREATE OR REPLACE TABLE h3_3_stats AS
SELECT H3_LATLNG_TO_CELL(
ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry)), 3) AS h3_3,
COUNT(*) num_buildings
FROM 'data_centres.parquet'
GROUP BY 1;
COPY (
SELECT ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_3)::geometry) geometry,
num_buildings
FROM h3_3_stats
WHERE ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
AND ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'h3_4_stats.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Each location has some indication if its a hyperscaler location or not.
SELECT "large facilites",
COUNT(*)
FROM 'data_centres.parquet'
GROUP BY 1;
ββββββββββββββββββββββββ¬βββββββββββββββ
β large facilites β count_star() β
β varchar β int64 β
ββββββββββββββββββββββββΌβββββββββββββββ€
β Possible hyperscaler β 156 β
β multiple permited β 165 β
β no value β 19 β
β not a hyperscaler β 880 β
β redacted β 20 β
ββββββββββββββββββββββββ΄βββββββββββββββ
Below are the individual hyperscaler locations along with the year of their first permit.
Buildings & Parcels
In many cases, a single record points to a single building.
But in other cases, the point rests on the parcel of land that more than one facility is located on.
The metadata does mention when there is more than one building being referred to in the record but it would be nice to see this dataset turned into a building footprint-specific dataset at some point.
OpenStreetMap Data
I looked for metadata in OpenStreetMap (OSM) for data centers so I could get an idea for how it compares against Business Insiderβs dataset. I was only able to location ~900 building footprints with OSMβs Layercake dataset, which is updated weekly.
$ ~/duckdb
SELECT COUNT(*),
tags.building,
tags."building:use"
FROM 'https://data.openstreetmap.us/layercake/buildings.parquet'
where tags.building ILIKE '%data%'
OR tags."building:use" ILIKE '%data%'
GROUP BY 2, 3
ORDER BY 1 DESC;
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββ
β count_star() β building β building:use β
β int64 β varchar β varchar β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββ€
β 901 β data_center β NULL β
β 8 β datacenter β NULL β
β 8 β data_centre β NULL β
β 3 β data β NULL β
β 3 β industrial β data_center β
β 2 β data center β NULL β
β 1 β Structure added, not on data, appears on satellite imagery β NULL β
β 1 β sa_data_yaye β NULL β
β 1 β data_center β industrial β
β 1 β Ciber DataClic β NULL β
β 1 β office β data_center β
β 1 β yes β NDATANG K KORUNG β
β 1 β industrial β datacenter β
β 1 β apartments β data_center β
ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββ€
β 14 rows 3 columns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Below are the data centers in OSM for the Bay Area.
Below are Business Insiderβs.
I noted that telecom: data_center or variations of that attribute are used from time to time in OSM. Below is one example for an Apple data center.
This attribute isnβt making its way into Layercake. Iβve raised a ticket so hopefully at some point, that might help fill in the gaps in OSMβs DC coverage.
$ echo "SELECT * EXCLUDE(tags),
tags: tags::JSON
FROM 'https://data.openstreetmap.us/layercake/buildings.parquet'
WHERE id = 300974499
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"bbox": "{'xmin': -111.60577, 'ymin': 33.34491, 'xmax': -111.60252, 'ymax': 33.349155}",
"geometry": "MULTIPOLYGON (((-111.6057634 33.3462637, -111.6057589 33.3458225, -111.6054351 33.3458248, -111.6054259 33.3449094, -111.6027724 33.344928, -111.60278 33.3456873, -111.6025203 33.3456891, -111.6025379 33.3474469, -111.6025464 33.3482889, -111.6028057 33.3482871, -111.6028144 33.3491573, -111.6054686 33.3491387, -111.6054397 33.346266, -111.6057634 33.3462637)))",
"id": 300974499,
"tags": {
"access": null,
"addr:city": "Mesa",
"addr:housenumber": "3740",
"addr:postcode": "85212",
"addr:street": "South Signal Butte Road",
"building": "industrial",
"building:colour": null,
"building:flats": null,
"building:levels": null,
"building:material": null,
"building:part": null,
"building:use": "data_center",
"height": null,
"name": "Apple Data Center",
"roof:colour": null,
"roof:height": null,
"roof:levels": null,
"roof:material": null,
"roof:orientation": null,
"roof:shape": null,
"start_date":