In April, Statistics Canada released a refreshed version of their Open Database of Buildings (ODB) dataset for Canada. This is one of Canadaβs most comprehensive building datasets. Below is a heatmap of its building footprints.
Statscan pulled buildings data from 530 datasets across 107 government sources. When I initially began to spot-check the data, even newly constructed neighbourhoods in South East Calgary were covered. Below is Overtureβs September release in purple and ODBβs buildings in yellow.
In this post, Iβll explore Statistics Canadaβs ODB dataset.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3β¦
In April, Statistics Canada released a refreshed version of their Open Database of Buildings (ODB) dataset for Canada. This is one of Canadaβs most comprehensive building datasets. Below is a heatmap of its building footprints.
Statscan pulled buildings data from 530 datasets across 107 government sources. When I initially began to spot-check the data, even newly constructed neighbourhoods in South East Calgary were covered. Below is Overtureβs September release in purple and ODBβs buildings in yellow.
In this post, Iβll explore Statistics Canadaβs ODB dataset.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my systemβs C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
Iβm running Ubuntu 24 LTS via Microsoftβs Ubuntu for Windows on Windows 11 Pro. In case youβre wondering why I donβt run a Linux-based desktop as my primary work environment, Iβm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
Iβll use GDAL 3.9.3 and a few other tools to help analyse the data in this post.
$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
gdal-bin \
jq
Iβll use DuckDB v1.3.0, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post. Normally I try and use the latest release of DuckDB but v1.4.0 has an issue where itβs Parquet files arenβt readable by many of the tools I use at the moment.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.3.0/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
Iβll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
The maps in this post were mostly rendered with QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGISβ Tile+ plugin to add basemaps from Google and OpenStreetMap (OSM) to the maps throughout this post.
Analysis-Ready Data
Statscan broke up the dataset into Zipped, GeoPackage (GPKG) files by province / territory with some of these broken across multiple files. Below Iβll build a manifest of these URLs and download them with four concurrent threads.
$ mkdir -p ~/odb
$ cd ~/odb
$ vi urls.txt
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_NL.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_PE.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_NS.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_NB.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_QC_1.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_QC_2.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_ON_1.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_ON_2.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_ON_3.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_MB.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_SK.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_AB.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_BC.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_YT.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_NT.zip
$ cat urls.txt \
| xargs -n1 \
-P4 \
-I% \
wget -c "%"
Iβll extract the GPKG files from each of the ZIPs.
$ find . \
-name "*.zip" \
-type f \
-exec unzip {} "*.gpkg" \;
Below is an example record from one of the GPKG files. Note β..β is being used for NULL values.
$ echo "FROM ST_READ('ODB_v3_AB.gpkg')
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"address": "..",
"csdname": "Clearwater County",
"csduid": "4809002",
"dataset": "Building Footprints",
"floors": "..",
"geom": "MULTIPOLYGON (((4615587.032158795 2032592.4950364884, 4615575.2249902915 2032585.4147511278, 4615570.227409255 2032593.7528553815, 4615582.04455081 2032600.8323487062, 4615587.032158795 2032592.4950364884)))",
"height": "..",
"id": "c9ffb5f954f942b3a98bad6b1932360c",
"name": "Residence 2",
"prov_terr": "AB",
"source": "Government of Canada",
"source_id": "1.0",
"sq_ft": "..",
"type": "..",
"units": "..",
"year_built": ".."
}
]
Iβll extract the projection Statscan used. This proj4 string will be used to below to re-project the data into EPSG:4326.
$ gdalsrsinfo \
-o proj4 \
ODB_v3_AB.gpkg
+proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=49 +lat_2=77 +x_0=6200000 +y_0=3000000 +datum=NAD83 +units=m +no_defs
Iβll convert the GPKG files into spatially-sorted, ZStandard-compressed Parquet format with an EPSG:4326 projection. Iβve also made a clear source field and added bounding boxes to each piece of geometry.
This format will load without issue in QGIS 3.44 and ArcGIS Pro 3.5. The bounding boxes will help optimise bandwidth usage when querying this data on remote servers, like AWS S3.
$ for FILENAME in *.gpkg; do
echo $FILENAME
BASENAME=`basename $FILENAME | cut -d. -f1`
echo "COPY (
WITH a AS (
SELECT * EXCLUDE(geom),
ST_FLIPCOORDINATES(
ST_TRANSFORM(
geom,
'+proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=49 +lat_2=77 +x_0=6200000 +y_0=3000000 +datum=NAD83 +units=m +no_defs',
'EPSG:4326')) geometry
FROM ST_READ('$FILENAME')
)
SELECT * EXCLUDE (address,
floors,
height,
name,
sq_ft,
type,
units,
year_built,
geometry),
{'xmin': ST_XMIN(ST_EXTENT(geometry)),
'ymin': ST_YMIN(ST_EXTENT(geometry)),
'xmax': ST_XMAX(ST_EXTENT(geometry)),
'ymax': ST_YMAX(ST_EXTENT(geometry))} AS bbox,
ST_ASWKB(geometry) geometry,
CASE WHEN address = '..' THEN NULL ELSE address END AS address,
CASE WHEN floors = '..' THEN NULL ELSE floors END AS floors,
CASE WHEN height = '..' THEN NULL ELSE height END AS height,
CASE WHEN name = '..' THEN NULL ELSE name END AS name,
CASE WHEN sq_ft = '..' THEN NULL ELSE sq_ft END AS sq_ft,
CASE WHEN type = '..' THEN NULL ELSE type END AS type,
CASE WHEN units = '..' THEN NULL ELSE units END AS units,
CASE WHEN year_built = '..' THEN NULL ELSE year_built END AS year_built
FROM a
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO '$BASENAME.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
" | ~/duckdb
done
The above turned 2.5 GB of ZIP files containing 6.2 GB of GPKG files into 1.8 GB of Parquet.
$ du -hsc *.parquet
164M ODB_v3_AB.parquet
188M ODB_v3_BC.parquet
90M ODB_v3_MB.parquet
75M ODB_v3_NB.parquet
21M ODB_v3_NL.parquet
62M ODB_v3_NS.parquet
1.4M ODB_v3_NT.parquet
261M ODB_v3_ON_1.parquet
259M ODB_v3_ON_2.parquet
211M ODB_v3_ON_3.parquet
11M ODB_v3_PE.parquet
249M ODB_v3_QC_1.parquet
223M ODB_v3_QC_2.parquet
35M ODB_v3_SK.parquet
1.5M ODB_v3_YT.parquet
1.8G total
Heatmap
Below is a heatmap of this dataset.
CREATE OR REPLACE TABLE h3_4_stats AS
SELECT H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin, 4) AS h3_4,
COUNT(*) num_buildings
FROM READ_PARQUET('ODB_v3*.parquet')
WHERE bbox.xmin BETWEEN -178.5 AND 178.5
GROUP BY 1;
COPY (
SELECT ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_4)::geometry) geometry,
num_buildings
FROM h3_4_stats
) TO 'h3_4_stats.gpkg'
WITH (FORMAT GDAL,
DRIVER 'GPKG',
LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
The following was needed for ArcGIS Pro to recognise the projection of the above hexagons properly.
$ ogr2ogr \
-f GPKG \
-a_srs EPSG:4326 \
h3_4_stats.4326.gpkg \
h3_4_stats.gpkg
Data Fluency
Below are the field names, data types, percentages of NULLs per column, number of unique values and minimum and maximum values for each column.
$ ~/duckdb
SELECT column_name,
column_type,
null_percentage,
approx_unique,
min,
max
FROM (SUMMARIZE
FROM READ_PARQUET('ODB_v3*.parquet'))
WHERE column_name != 'geometry'
AND column_name != 'bbox'
ORDER BY 1;
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β column_name β column_type β null_percentage β approx_unique β min β max β
β varchar β varchar β decimal(9,2) β int64 β varchar β varchar β
βββββββββββββββΌββββββββββββββΌββββββββββββββββββΌββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ€
β address β VARCHAR β 69.19 β 4840881 β β Γ dΓ©finir β
β csdname β VARCHAR β 0.00 β 3099 β Abbotsford β qathet E β
β csduid β VARCHAR β 0.00 β 3465 β 1001124 β 6106097 β
β dataset β VARCHAR β 0.00 β 966 β 2022 Voting Location Building Footprint β Γdifices municipaux; Lieux Publics β
β floors β VARCHAR β 97.34 β 35 β 1.0 β 9.0 β
β height β VARCHAR β 91.59 β 139106 β -0.10836829 β 99.99 β
β id β VARCHAR β 0.00 β 14730146 β 0000014f86fe39839c5f1c118e058bfb β ffffff6e9491f9ecb4f4444f7457e84b β
β name β VARCHAR β 98.73 β 29024 β β οΏ½cole secondaire de Par-en-Bas β
β prov_terr β VARCHAR β 0.00 β 11 β AB β YT β
β source β VARCHAR β 0.00 β 99 β Cape Breton Regional Municipality (CBRM) β Ville de Sherbrooke β
β source_id β VARCHAR β 0.00 β 8539298 β β {FFFFFB51-F6CE-4DF6-93AA-FBFDDF22FEEB} β
β sq_ft β VARCHAR β 99.05 β 41985 β 10.60419022 β 9996.0 β
β type β VARCHAR β 85.22 β 989 β 117 β Γglise β
β units β VARCHAR β 98.42 β 176 β 1.0 β 99.0 β
β year_built β VARCHAR β 98.39 β 235 β 1750 β c β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββ€
β 15 rows 6 columns β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The NULL percentages on the metadata fields seems very high. Having an address for each building is unique as other datasets like Overture donβt link buildings and addresses to one another. Itβs a shame there is ~30% coverage here. Some of the single buildingβs contained several addresses concatenated as a single string them as well.
$ ~/duckdb
SELECT address
FROM 'ODB_v3_*.parquet'
ORDER BY LENGTH(address) DESC
LIMIT 1;
S1001-330 Phillip St; S1002-330 Phillip St; S1003-330 Phillip St; S1004-330 Phillip St; S1005-330 Phillip St; S1006-330 Phillip St; S1007-330...
Sources
Below are a breakdown of sources in this dataset.
$ ~/duckdb
SELECT source,
COUNT(*)
FROM 'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 2 DESC;
βββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββ
β source β count_star() β
β varchar β int64 β
βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β Government of Canada β 3797173 β
β Government of QuΓ©bec β 2448716 β
β Government of New Brunswick β 572033 β
β City of Toronto β 531571 β
β City of Calgary β 488028 β
β City of Ottawa β 385184 β
β City of Edmonton β 365567 β
β Regional Municipality of York β 320664 β
β Ville de QuΓ©bec β 279793 β
β Regional Municipality of Durham β 263487 β
β City of Mississauga β 225262 β
β Ville de MontrΓ©al β 220311 β
β City of Brampton β 215959 β
β City of Hamilton β 200234 β
β City of London β 191992 β
β County of Simcoe β 191123 β
β Niagara Region β 168182 β
β Halifax Regional Municipality β 163638 β
β City of Vancouver β 153988 β
β City of Surrey β 137180 β
β Β· β Β· β
β Β· β Β· β
β Β· β Β· β
β Regional District of Central Okanagan β 9248 β
β City of Waterloo β 9017 β
β District of Summerland β 8899 β
β Town of Orangeville β 8528 β
β City of Welland β 8273 β
β City of Oshawa β 7722 β
β District of Squamish β 7031 β
β Resort Municipality of Whistler β 6500 β
β Town of Canmore β 6365 β
β Town of Truro β 5893 β
β City of White Rock β 5347 β
β City of Grand Forks β 3000 β
β Town of Gibsons β 2581 β
β Town of Banff β 2045 β
β City of Pickering β 1416 β
β City of Port Moody β 1304 β
β Ville de Sherbrooke β 590 β
β Ville de Shawinigan β 424 β
β City of Markham β 81 β
β City of Maple Ridge β 10 β
βββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββ€
β 107 rows (40 shown) 2 columns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Building Use
12.2M buildings donβt have a classification for their usage. Oddly, Calgary seems to have good coverage of this so it might hit-or-miss based on municipality supplying the data.
SELECT type,
COUNT(*)
FROM 'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 2 DESC;
βββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββ
β type β count_star() β
β varchar β int64 β
βββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β NULL β 12286872 β
β Residential β 616944 β
β General β 218243 β
β RΓ©sidence β 142671 β
β Single Family Dwelling β 134333 β
β Garage - Annexe - Remise β 127242 β
β Residential Garage β 119061 β
β Shed β 75565 β
β Residence β 75498 β
β RΓ©sidentielle β 74947 β
β Single / Semi / Duplex β 55742 β
β Detached House β 45730 β
β Single-Family Home β 42663 β
β Detached β 36366 β
β RΓ©sidentiel β 31142 β
β Commercial β 27585 β
β RΓ©sidentiel et commercial β 24130 β
β Garage β 16461 β
β General/Residential β 15758 β
β Accessory β 15452 β
β Β· β Β· β
β Β· β Β· β
β Β· β Β· β
β Arts Program Facility β 1 β
β Ontario Provincial Police β 1 β
β Hospital//HΓ΄pital β 1 β
β Private High School β 1 β
β Education / Fitness / Recreation β 1 β
β Auditorium / Concert Hall β 1 β
β Multi Recreation Facility β 1 β
β Performance Space - Outdoor Venue β 1 β
β Cafe / Bakery / Restaurant β 1 β
β Farmers Market β 1 β
β Hanger β 1 β
β Grocery β 1 β
β Education / Museum / Historic Sites β 1 β
β Youth Services β 1 β
β Golf Club β 1 β
β ECOLE ACADIENNE β 1 β
β Lighthouse property//PropriΓ©tΓ© du phare β 1 β
β Public Secondary School β 1 β
β LEASE β 1 β
β Bowling Alley β 1 β
βββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββ€
β 968 rows (40 shown) 2 columns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Building Heights
13.2M buildings donβt have any height data and there are a lot of values which seem implausible. There are other sources for building heights in Canada that, with some processing, could probably do a good job in this area.
SELECT height,
COUNT(*)
FROM 'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 1;
βββββββββββββββ¬βββββββββββββββ
β height β count_star() β
β varchar β int64 β
βββββββββββββββΌβββββββββββββββ€
β -0.10836829 β 1 β
β -0.12246253 β 1 β
β -0.16561381 β 1 β
β -0.21919132 β 1 β
β -0.377 β 2 β
β -0.50739307 β 1 β
β 0.00344336 β 1 β
β 0.0106756 β 1 β
β 0.01068541 β 1 β
β 0.01268867 β 1 β
β 0.01376321 β 1 β
β 0.01511616 β 1 β
β 0.016097 β 1 β
β 0.02035425 β 1 β
β 0.02102936 β 1 β
β 0.02107398 β 1 β
β 0.02117555 β 1 β
β 0.02222852 β 1 β
β 0.02618704 β 1 β
β 0.02664106 β 1 β
β Β· β Β· β
β Β· β Β· β
β Β· β Β· β
β 99.79 β 2 β
β 99.8 β 6 β
β 99.81 β 2 β
β 99.82 β 1 β
β 99.84 β 2 β
β 99.85 β 1 β
β 99.86 β 8 β
β 99.87 β 2 β
β 99.88 β 2 β
β 99.9 β 9 β
β 99.91 β 5 β
β 99.92 β 5 β
β 99.93 β 3 β
β 99.94 β 4 β
β 99.95 β 2 β
β 99.96 β 3 β
β 99.97 β 3 β
β 99.98 β 5 β
β 99.99 β 2 β
β NULL β 13204713 β
βββββββββββββββ΄βββββββββββββββ€
β 139513 rows (40 shown) β
ββββββββββββββββββββββββββββββ
Year of Construction
It would be nice to see integers in this field but I can understand parts of Canada were settled at a time when record-keeping wasnβt amazing or good records might have been lost at some point.
SELECT year_built,
COUNT(*)
FROM 'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 1;
ββββββββββββββββββββ¬βββββββββββββββ
β year_built β count_star() β
β varchar β int64 β
ββββββββββββββββββββΌβββββββββββββββ€
β 1750 β 1 β
β 1758 β 1 β
β 1784 β 2 β
β 1785 β 1 β
β 1786 β 2 β
β 1787 β 1 β
β 1791 β 1 β
β 1797 β 1 β
β 1800 β 41 β
β 1807 β 2 β
β 1810 β 1 β
β 1812 β 1 β
β 1814 β 2 β
β 1816 β 2 β
β 1818 β 1 β
β 1819 β 1 β
β 1820 β 4 β
β 1824 β 3 β
β 1825 β 3 β
β 1826 β 1 β
β Β· β Β· β
β Β· β Β· β
β Β· β Β· β
β C 1880 β 2 β
β C 1885 β 1 β
β C 1890 β 2 β
β C 1900 β 1 β
β C 1910 β 4 β
β C 1920 β 1 β
β C 1930 β 1 β
β C 1940 β 1 β
β CIRCA 1870 β 1 β
β CIRCA 1900 β 1 β
β CIRCA 1930 β 1 β
β Mid-19th century β 2 β
β PRIOR 1956 β 5437 β
β PRIOR 1962 β 31 β
β PRIOR 1966 β 1 β
β PRIOR 1969 β 20 β
β PRIOR 1978 β 3 β
β PRIOR 1989 β 6 β
β c β 1 β
β NULL β 14186018 β
ββββββββββββββββββββ΄βββββββββββββββ€
β 247 rows (40 shown) 2 columns β
βββββββββββββββββββββββββββββββββββ
Calgaryβs years of construction are entirely unknown.
SELECT year_built,
COUNT(*)
FROM 'ODB_v3_*.parquet'
WHERE ST_X(ST_CENTROID(geometry)) BETWEEN -114.3461 AND -113.8326
AND ST_Y(ST_CENTROID(geometry)) BETWEEN 50.8334 AND 51.2422
GROUP BY 1
ORDER BY 1;
ββββββββββββββ¬βββββββββββββββ
β year_built β count_star() β
β varchar β int64 β
ββββββββββββββΌβββββββββββββββ€
β NULL β 496793 β
ββββββββββββββ΄βββββββββββββββ
Building Floor Counts
382K buildings have a floor count.
SELECT floors,
COUNT(*)
FROM 'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 1;
βββββββββββ¬βββββββββββββββ
β floors β count_star() β
β varchar β int64 β
βββββββββββΌβββββββββββββββ€
β 1.0 β 212594 β
β 1.5 β 877 β
β 10.0 β 44 β
β 11.0 β 38 β
β 12.0 β 47 β
β 13.0 β 15 β
β 14.0 β 24 β
β 15.0 β 16 β
β 16.0 β 13 β
β 17.0 β 13 β
β 18.0 β 15 β
β 19.0 β 16 β
β 2.0 β 161415 β
β 20.0 β 8 β
β 21.0 β 8 β
β 22.0 β 5 β
β 23.0 β 2 β
β 24.0 β 3 β
β 25.0 β 7 β
β 26.0 β 3 β
β 27.0 β 1 β
β 28.0 β 2 β
β 3.0 β 6402 β
β 31.0 β 1 β
β 32.0 β 1 β
β 34.0 β 1 β
β 39.0 β 1 β
β 4.0 β 642 β
β 45.0 β 1 β
β 5.0 β 156 β
β 6.0 β 240 β
β 7.0 β 78 β
β 8.0 β 79 β
β 9.0 β 60 β
β NULL β 14034601 β
βββββββββββ΄βββββββββββββββ€
β 35 rows 2 columns β
ββββββββββββββββββββββββββ
Footprint Coverage
This dataset contains 14,417,429 buildings.
SELECT COUNT(*)
FROM 'ODB_v3_*.parquet';
14,417,429
Iβll download OSMβs buildings from September 23rd. This file was produced by the Layercake project.
$ wget -c https://data.openstreetmap.us/layercake/buildings.parquet
Iβll download a rough outline of Canadaβs provinces in GeoJSON format. Iβll then convert it into GPKG as GPKG files require less syntax to work with in DuckDB.
$ wget -c https://gist.github.com/Thiago4breu/6ba01976161aa0be65e0a289412dc54c/raw/8ec57d8317a2abe5bae18e5fd86f777fab649f84/canada-provinces.geojson
$ ogr2ogr \
-f GPKG \
canada-provinces.gpkg \
canada-provinces.geojson
Iβll turn the provinces dataset into a DuckDB table and count how many buildings OSM has that are covered by any of the provinces.
$ ~/duckdb
CREATE OR REPLACE TABLE canada AS
FROM ST_READ('canada-provinces.gpkg');
SELECT COUNT(*)
FROM 'buildings.parquet' b
LEFT JOIN canada c ON ST_CoveredBy(b.geometry, c.geom)
WHERE c.name IS NOT NULL;
The above returned a count of 7,849,223 buildings.
The PSC dataset I reviewed the other week contains 13.7M buildings which is ~305K more than the TUM dataset I also reviewed a few weeks ago.
So far, ODB has the largest building count but I noticed Dease Lake, a remote community in Northern BC, is absent from this dataset. The community has been mapped out in OSM since at least February 2023.
It looks like no one has the perfect dataset and mixing and matching all of these datasets would produce the closest to complete coverage of Canadaβs buildings as one will find across any open feed.
Provincial Boundaries
I looked along Albertaβs borders with BC and Saskatchewan and the provincial attributions to each of the buildings look very accurate.
The following settlement at -110.005 W, 50.956 N sits along the Alberta-Saskatchewan border. The buildings in Alberta are in red and the ones in Saskatchewan are yellow.
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If youβd like to discuss how my offerings can help your business please contact me via LinkedIn.