In July, Public Safety Canada (PSC) released Canada Structures. This dataset contains Canadaβs 13M building footprints, heights and metadata describing their usage.
They sourced data from OpenStreetMap (OSM), Microsoft Building Footprints (MSB), Open Data Base of Buildings (ODB) and height data was collected from Natural Resources Canada.
The dataset was assembled in April 2024. The ODB data, which is a collection of data from municipalities of Canada, was from 2019, not the newer 2025 release.
Iβm not certain when they collected their data from OSM but I found records from February 2023 that didnβt make the cut into this release.
The MSB daβ¦
In July, Public Safety Canada (PSC) released Canada Structures. This dataset contains Canadaβs 13M building footprints, heights and metadata describing their usage.
They sourced data from OpenStreetMap (OSM), Microsoft Building Footprints (MSB), Open Data Base of Buildings (ODB) and height data was collected from Natural Resources Canada.
The dataset was assembled in April 2024. The ODB data, which is a collection of data from municipalities of Canada, was from 2019, not the newer 2025 release.
Iβm not certain when they collected their data from OSM but I found records from February 2023 that didnβt make the cut into this release.
The MSB data was likely version 1.1 which was published in June of 2019.
Below is a heatmap of the building footprints.
In this post, Iβll explore Canada Structuresβ version 1.0.0.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my systemβs C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
Iβm running Ubuntu 24 LTS via Microsoftβs Ubuntu for Windows on Windows 11 Pro. In case youβre wondering why I donβt run a Linux-based desktop as my primary work environment, Iβm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
Iβll use GDAL 3.9.3 and a few other tools to help analyse the data in this post.
$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
gdal-bin \
jq
Iβll use DuckDB, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.3.2/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
Iβll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
The maps in this post were mostly rendered with QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGISβ Tile+ plugin to add basemaps from Google and OpenStreetMap (OSM) to the maps throughout this post.
Analysis-Ready Data
PSC broke up the dataset into GeoPackage (GPKG) files by province / territory. Below Iβll build a manifest of these URLs and download them with four concurrent threads.
$ mkdir -p ~/psc
$ cd ~/psc
$ vi urls.txt
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/909a44d6-1040-408d-896e-7362d2165688/download/ab_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/d858fd57-2c9a-4978-bc50-79aef286bccf/download/mb_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/f7fdee7d-09dd-4eb6-88fd-2234ad701309/download/on_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/71db1a0f-0099-4fbe-ad9a-f51626506e8a/download/qc_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/5490a8a0-753a-41f7-8ab5-d3870ee0ea69/download/bc_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/b903c565-8fdc-453e-9927-8b1a33829d07/download/ns_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/f4a95ecb-5b58-4021-a591-ca1d8b2bbae4/download/sk_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/610fb5b5-e12c-41a1-aa28-415b51c24333/download/nt_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/d892ce68-bcd8-463b-a2fa-0439ef75c2b8/download/pe_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/680f47b9-9e23-4888-9950-10e561ae0556/download/nl_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/fcd7fe5b-be3e-4499-8b6e-11ce5ca391cc/download/nu_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/15737be6-24f3-4764-b40c-6f23f8dd44e0/download/yk_structures_en.gpkg
https://open.canada.ca/data/dataset/3829eee9-f898-4643-9ad8-f48575b8873d/resource/6ff6252a-50f4-43b6-b6d0-7d6d904a3e68/download/nb_structures_en.gpkg
$ cat urls.txt \
| xargs -n1 \
-P4 \
-I% \
wget -c "%"
Iβll extract the projection PSC used. This proj4 string will be used to below to re-project the data into EPSG:4326.
$ gdalsrsinfo -o proj4 ab_structures_en.gpkg
+proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=49 +lat_2=77 +x_0=6200000 +y_0=3000000 +ellps=GRS80 +units=m +no_defs
Iβll convert the GPKG files into spatially-sorted, ZStandard-compressed Parquet format with an EPSG:4326 projection. Iβve also made a clear source field and added bounding boxes to each piece of geometry.
This format will load without issue in QGIS 3.44 and ArcGIS Pro 3.5. The bounding boxes will help optimise bandwidth usage when querying this data on remote servers, like AWS S3.
$ for FILENAME in *.gpkg; do
echo $FILENAME
BASENAME=`basename $FILENAME | cut -d. -f1`
echo "COPY (
WITH a AS (
SELECT * EXCLUDE(geom),
ST_FLIPCOORDINATES(
ST_TRANSFORM(geom,
'+proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=49 +lat_2=77 +x_0=6200000 +y_0=3000000 +ellps=GRS80 +units=m +no_defs +type=crs',
'EPSG:4326')) geometry
FROM ST_READ('$FILENAME')
)
SELECT *,
{'xmin': ST_XMIN(ST_EXTENT(geometry)),
'ymin': ST_YMIN(ST_EXTENT(geometry)),
'xmax': ST_XMAX(ST_EXTENT(geometry)),
'ymax': ST_YMAX(ST_EXTENT(geometry))} AS bbox,
CASE WHEN MSB IS NOT NULL THEN 'MSB'
WHEN ODB IS NOT NULL THEN 'ODB'
WHEN OSM IS NOT NULL THEN 'OSM'
END AS source
FROM a
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO '$BASENAME.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
" | ~/duckdb
done
The above turned 3.9 GB of GPKG files into 1.5 GB of Parquet.
$ du -hsc *structures_en.parquet
197M ab_structures_en.parquet
169M bc_structures_en.parquet
69M mb_structures_en.parquet
40M nb_structures_en.parquet
26M nl_structures_en.parquet
51M ns_structures_en.parquet
2.9M nt_structures_en.parquet
1.1M nu_structures_en.parquet
536M on_structures_en.parquet
7.6M pe_structures_en.parquet
276M qc_structures_en.parquet
71M sk_structures_en.parquet
1.6M yk_structures_en.parquet
1.5G total
Heatmap
Below is a heatmap of this dataset.
CREATE OR REPLACE TABLE h3_4_stats AS
SELECT H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin, 4) AS h3_4,
COUNT(*) num_buildings
FROM READ_PARQUET('*_structures_en.parquet')
WHERE bbox.xmin BETWEEN -178.5 AND 178.5
GROUP BY 1;
COPY (
SELECT ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_4)::geometry) geometry,
num_buildings
FROM h3_4_stats
) TO 'h3_4_stats.gpkg'
WITH (FORMAT GDAL,
DRIVER 'GPKG',
LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
The following was needed for ArcGIS Pro to recognise the projection of the above hexagons properly.
$ ogr2ogr \
-f GPKG \
-a_srs EPSG:4326 \
h3_4_stats.4326.gpkg \
h3_4_stats.gpkg
Data Fluency
Below are the field names, data types, percentages of NULLs per column, number of unique values and minimum and maximum values for each column.
$ ~/duckdb
SELECT column_name,
column_type,
null_percentage,
approx_unique,
min,
max
FROM (SUMMARIZE
FROM READ_PARQUET('*_structures_en.parquet'))
WHERE column_name != 'geometry'
AND column_name != 'bbox'
ORDER BY 1;
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββββββββ
β column_name β column_type β null_percentage β approx_unique β min β max β
β varchar β varchar β decimal(9,2) β int64 β varchar β varchar β
βββββββββββββββΌββββββββββββββΌββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββββ€
β Area β DOUBLE β 0.00 β 14644202 β 0.00022589575436 β 2788966.1142695914 β
β CS_ID β BIGINT β 0.00 β 12278694 β 1 β 13763995 β
β Height β DOUBLE β 13.72 β 8158934 β 0.0 β 270.43516061452516 β
β LC_Name β VARCHAR β 96.04 β 19469 β "Sifton Cemetery" β Γles aux Chats β
β MSB β VARCHAR β 50.69 β 1 β 1 β 1 β
β ODB β VARCHAR β 67.74 β 1 β 1 β 1 β
β ODB_ID β VARCHAR β 67.74 β 4186001 β 12010010000001 β 61060970000014 β
β ODB_Source β VARCHAR β 67.74 β 66 β Airdrie β York Region β
β OSM β VARCHAR β 56.89 β 1 β 1 β 1 β
β OSM_ID β VARCHAR β 56.89 β 6684280 β 1000016583 β j β
β OSM_LC β VARCHAR β 32.15 β 16 β allotments β vineyard β
β OSM_Name β VARCHAR β 99.10 β 95599 β "8-Building" β Γ©table β
β OSM_Type β VARCHAR β 82.44 β 358 β Administratif β yurt β
β Perimeter β DOUBLE β 0.00 β 13164420 β 0.107992344195257 β 16624.67147343245 β
β Province β VARCHAR β 0.00 β 12 β AB β YK β
β source β VARCHAR β 0.00 β 3 β MSB β OSM β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββββ΄βββββββββββββββββββββ€
β 16 rows 6 columns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Sources
This is the breakdown of records per source per province. There is a clear preference for MSB with ODB often completely absent.
$ ~/duckdb
WITH a AS (
SELECT Province,
source,
COUNT(*) cnt
FROM '*_structures_en.parquet'
GROUP BY 1, 2)
PIVOT a
ON source
USING SUM(cnt)
GROUP BY Province
ORDER BY 2 DESC;
ββββββββββββ¬ββββββββββ¬ββββββββββ¬βββββββββ
β Province β MSB β ODB β OSM β
β varchar β int128 β int128 β int128 β
ββββββββββββΌββββββββββΌββββββββββΌβββββββββ€
β QC β 1798578 β 491156 β 504603 β
β ON β 1333101 β 2465651 β 934046 β
β AB β 1082368 β 495617 β 332508 β
β BC β 688603 β 556590 β 290223 β
β MB β 638831 β NULL β 82979 β
β SK β 509567 β 98226 β 130160 β
β NL β 255350 β NULL β 32917 β
β NB β 216369 β 95832 β 91412 β
β NS β 168198 β 230801 β 108674 β
β PE β 77119 β NULL β 4567 β
β YK β 11563 β NULL β 4461 β
β NT β 3192 β 5835 β 10589 β
β NU β 2882 β NULL β 8186 β
ββββββββββββ΄ββββββββββ΄ββββββββββ΄βββββββββ€
β 13 rows 4 columns β
βββββββββββββββββββββββββββββββββββββββββ
The following is an example record which has been attributed to OSM.
$ echo "FROM 'ab_structures_en.parquet'
WHERE source = 'OSM'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"Area": 1951.60849850855,
"CS_ID": 7889231,
"Height": 0.0,
"LC_Name": null,
"MSB": null,
"ODB": null,
"ODB_ID": null,
"ODB_Source": null,
"OSM": "1",
"OSM_ID": "385156020",
"OSM_LC": "forest",
"OSM_Name": null,
"OSM_Type": "industrial",
"Perimeter": 283.5263470280223,
"Province": "AB",
"bbox": "{'xmin': -110.42687370425128, 'ymin': 55.06546590051143, 'xmax': -110.4250928025303, 'ymax': 55.0657240008682}",
"geometry": "POLYGON ((-110.42687370425128 55.06546590051143, -110.42687370423948 55.0657240008682, -110.4250928025303 55.065687100742686, -110.42509280279089 55.06555190107596, -110.42655190404591 55.06558880094082, -110.42655190337649 55.06546590070232, -110.42687370425128 55.06546590051143))",
"source": "OSM"
}
]
As the following shows, records which are not attributed to OSM can still contain OSM metadata.
$ echo "FROM 'ab_structures_en.parquet'
WHERE source = 'MSB'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"Area": 197.982910306995,
"CS_ID": 8653450,
"Height": 0.0,
"LC_Name": "Cold Lake Air Weapons Range",
"MSB": "1",
"ODB": null,
"ODB_ID": null,
"ODB_Source": null,
"OSM": null,
"OSM_ID": null,
"OSM_LC": "military",
"OSM_Name": null,
"OSM_Type": null,
"Perimeter": 64.80059892006688,
"Province": "AB",
"bbox": "{'xmin': -110.59742700235742, 'ymin': 55.01551100079211, 'xmax': -110.59729400280793, 'ymax': 55.01572700092107}",
"geometry": "POLYGON ((-110.59742700235742 55.01551100079211, -110.59742300421074 55.01572700092107, -110.59729400280793 55.015726000806524, -110.59729800362015 55.0155110012002, -110.59742700235742 55.01551100079211))",
"source": "MSB"
}
]
Building Use
There are two fields, OSM_LC and OSM_Type, which can describe a buildingβs use and surrounding area. Below are the record counts for the 21 unique values for OSM_LC. Itβs good to see most records are non-NULL.
SELECT COUNT(*),
OSM_LC
FROM '*_structures_en.parquet'
GROUP BY 2
ORDER BY 1 DESC;
ββββββββββββββββ¬ββββββββββββββββββββ
β count_star() β OSM_LC β
β int64 β varchar β
ββββββββββββββββΌββββββββββββββββββββ€
β 7907072 β residential β
β 4423474 β NULL β
β 917211 β forest β
β 177327 β industrial β
β 94802 β retail β
β 79847 β commercial β
β 49464 β farmyard β
β 29279 β park β
β 27556 β farmland β
β 24495 β nature_reserve β
β 9042 β military β
β 4942 β quarry β
β 4847 β grass β
β 2934 β cemetery β
β 2921 β recreation_ground β
β 2249 β meadow β
β 1787 β scrub β
β 1222 β orchard β
β 131 β vineyard β
β 108 β allotments β
β 44 β heath β
ββββββββββββββββ΄ββββββββββββββββββββ€
β 21 rows 2 columns β
ββββββββββββββββββββββββββββββββββββ
The OSM_Type has much more of a long tail and could probably do with some clustering. Also, out of the ~13M records in this dataset, this column is empty for 11.3M of them. There are also some values here, like industrial, which overlap with OSM_LC.
SELECT COUNT(*),
OSM_Type
FROM '*_structures_en.parquet'
GROUP BY 2
ORDER BY 1 DESC;
ββββββββββββββββ¬βββββββββββββββββββββββ
β count_star() β OSM_Type β
β int64 β varchar β
ββββββββββββββββΌβββββββββββββββββββββββ€
β 11344842 β NULL β
β 864092 β detached β
β 766544 β house β
β 140230 β residential β
β 122762 β garage β
β 112894 β terrace β
β 67584 β shed β
β 63904 β apartments β
β 41522 β commercial β
β 35143 β industrial β
β 32287 β semidetached_house β
β 30928 β retail β
β 25959 β semi β
β 17495 β farm_auxiliary β
β 14288 β static_caravan β
β 14078 β school β
β 9024 β roof β
β 8401 β barn β
β 5980 β cabin β
β 5887 β church β
β Β· β Β· β
β Β· β Β· β
β Β· β Β· β
β 1 β residential;yes β
β 1 β air-supported β
β 1 β Control Tower β
β 1 β Banquet_Hall β
β 1 β hydro β
β 1 β outside β
β 1 β G β
β 1 β changing_rooms β
β 1 β walkway β
β 1 β sse β
β 1 β tourism β
β 1 β balcony β
β 1 β ramada_studio β
β 1 β military_armoury β
β 1 β Sports,_exercise_are β
β 1 β RΓ©sidence_Maria-Gor β
β 1 β Hospital and Nursing β
β 1 β recreational β
β 1 β social_facility β
β 1 β school;church β
ββββββββββββββββ΄βββββββββββββββββββββββ€
β 401 rows (40 shown) 2 columns β
βββββββββββββββββββββββββββββββββββββββ
Below is the number of records with the above two fields populated broken down by source.
WITH a AS (
SELECT COUNT(*) cnt,
source,
OSM_LC IS NOT NULL AS has_lc,
OSM_Type IS NOT NULL AS has_type
FROM '*_structures_en.parquet'
GROUP BY 2, 3, 4
)
PIVOT a
ON has_lc, has_type
USING SUM(cnt)
GROUP BY source
ORDER BY 1;
βββββββββββ¬ββββββββββββββ¬βββββββββββββ¬βββββββββββββ¬ββββββββββββ
β source β false_false β false_true β true_false β true_true β
β varchar β int128 β int128 β int128 β int128 β
βββββββββββΌββββββββββββββΌβββββββββββββΌβββββββββββββΌββββββββββββ€
β MSB β 3014756 β 22326 β 3676601 β 72038 β
β ODB β 552400 β 158812 β 2518475 β 1210021 β
β OSM β 490810 β 184370 β 1091800 β 768345 β
βββββββββββ΄ββββββββββββββ΄βββββββββββββ΄βββββββββββββ΄ββββββββββββ
Building Heights
Not all buildings have heights. Below is the number of buildings that do broken down by source.
WITH a AS (
SELECT COUNT(*) cnt,
source,
Height > 0 AS has_height
FROM '*_structures_en.parquet'
GROUP BY 2, 3
)
PIVOT a
ON has_height
USING SUM(cnt)
GROUP BY source
ORDER BY 1;
βββββββββββ¬ββββββββββ¬ββββββββββ
β source β false β true β
β varchar β int128 β int128 β
βββββββββββΌββββββββββΌββββββββββ€
β MSB β 2078923 β 3488475 β
β ODB β 145537 β 4019747 β
β OSM β 491994 β 1648099 β
βββββββββββ΄ββββββββββ΄ββββββββββ
Building heights can look good at a distance. Below is Downtown Calgary.
But in neighbourhoods with uniform housing, the heights can vary wildly. Below is Douglasdale in Calgary. The labels are each homeβs height, rounded to the nearest tenth.
Footprint Coverage
Newer neighbourhoods built after 2024 wonβt appear in this dataset. The image below is from South East Calgary. In yellow is the PSCβs dataset with Overtureβs August release in purple.
This is one of the purple areas on June 25th.
The TUM dataset contains homes not yet in either Overture or the PSAβs dataset. Below is from Calgary with TUMβs footprints highlighted in red, Overture in purple and PSCβs in yellow.
Some properties that are detached from one another are showing as being attached in this dataset. PSCβs buildings are outlined in yellow.
This is the same area on Google Streetview. As you can see, the houses are detached from one another.
MSB footprints can take on a life of their own. Below is a screen shot from Northern Alberta.
These are the sources of each footprint in PSCβs dataset. It is a bit of a shame OSM wasnβt picked as a first choice with ODB filling in the gaps.
I noticed Dease Lake, a remote community in Northern BC, which has had OSM data of its buildings since at least February 2023, is completely missing from this dataset. The following returns zero records.
SELECT COUNT(*)
FROM 'bc_structures_en.parquet'
WHERE ST_X(ST_CENTROID(geometry)) BETWEEN -129.967107 AND -130.047500
AND ST_Y(ST_CENTROID(geometry)) BETWEEN 58.425803 AND 58.437748;
But with that said, I also looked along Albertaβs borders with BC and SK and the provincial attributions to each of the buildings look very accurate.
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If youβd like to discuss how my offerings can help your business please contact me via LinkedIn.