Overture Mapβs publish a Places dataset along with each of their monthly releases. Places can be thought of as points of interest (POIs). As of January, they have over 72 million of these POIs covering the world.
Below is an example from Viru Street in Tallinn, Estonia. Each point is labelled with its primary name and basic category in brackets. The point colour is based on the POIβs basic category.
Overture has made several improvements to the Places dataset in recent months. In September, they added operating status and confidence properties. In October, they added a basic category property and in December, a taxonomy property was added.
In this post, Iβll examine their latest Places release.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores β¦
Overture Mapβs publish a Places dataset along with each of their monthly releases. Places can be thought of as points of interest (POIs). As of January, they have over 72 million of these POIs covering the world.
Below is an example from Viru Street in Tallinn, Estonia. Each point is labelled with its primary name and basic category in brackets. The point colour is based on the POIβs basic category.
Overture has made several improvements to the Places dataset in recent months. In September, they added operating status and confidence properties. In October, they added a basic category property and in December, a taxonomy property was added.
In this post, Iβll examine their latest Places release.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my systemβs C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
Iβm running Ubuntu 24 LTS via Microsoftβs Ubuntu for Windows on Windows 11 Pro. In case youβre wondering why I donβt run a Linux-based desktop as my primary work environment, Iβm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
Iβll use Python 3.12.3 and a few other tools to help analyse the data in this post.
$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt update
$ sudo apt install \
jq \
python3-pip \
python3.12-venv
Below, Iβll set up a Python Virtual Environment.
$ python3 -m venv ~/.pois
$ source ~/.pois/bin/activate
Iβll use a Parquet debugging tool Iβve been working on to see how much space each column takes up in one of the Parquet files.
$ git clone https://github.com/marklit/pqview \
~/pqview
$ python3 -m pip install \
-r ~/pqview/requirements.txt
Iβll also install the latest AWS CLI.
$ python3 -m pip install awscli
Iβll use DuckDB v1.4.3, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.3/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
Iβll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
The maps in this post were rendered using QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGISβ Tile+ and HCMGIS plugins to add basemaps from Bing and Esri to the maps in this post.
Downloading Overtureβs Places
As of last September, Overture will remove releases after 60 days. The Places URL used in this post will stop working sometime in March.
Below lists Overtureβs available releases. Use the latest release version in the S3 URL below to download this dataset.
$ curl https://labs.overturemaps.org/data/releases.json
{
"latest": "2026-01-21.0",
"releases": [
"2026-01-21.0",
"2025-12-17.0"
]
}
$ aws s3 --no-sign-request sync \
s3://overturemaps-us-west-2/release/2026-01-21.0/theme=places/type=place/ \
~/places
The above downloaded 8 Parquet files with a total disk footprint of 7.2 GB.
Data Fluency
This dataset contains 72,444,739 records.
$ ~/duckdb
SELECT COUNT(*)
FROM 'places/part*.parquet';
βββββββββββββββββββ
β count_star() β
β int64 β
βββββββββββββββββββ€
β 72444739 β
β (72.44 million) β
βββββββββββββββββββ
Below is a heatmap showing where the POIs are most heavily concentrated.
CREATE OR REPLACE TABLE h3_3_stats AS
SELECT h3_3: H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin,
3),
num_pois: COUNT(*)
FROM 'places/part*.parquet'
GROUP BY 1;
COPY (
SELECT geometry: ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_3)::geometry),
num_pois
FROM h3_3_stats
WHERE ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
AND ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'num_pois.h3_3_stats.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Below is an example record from this dataset.
$ echo "SELECT * EXCLUDE(addresses,
bbox,
brand,
categories,
names,
taxonomy,
sources),
addresses: addresses::JSON,
bbox: bbox::JSON,
brand: brand::JSON,
categories: categories::JSON,
names: names::JSON,
taxonomy: taxonomy::JSON,
sources: sources::JSON
FROM 'places/part*.parquet'
WHERE brand.names.primary = 'Starbucks'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"addresses": [
{
"country": "CL",
"freeform": "Juan Soler Manfredini 131, local 302 y 302B, Mall Paseo Costanera Puerto Montt",
"locality": "Puerto Montt",
"postcode": "5504750",
"region": null
}
],
"basic_category": "cafe",
"bbox": {
"xmax": -72.9016342163086,
"xmin": -72.90164184570312,
"ymax": -41.48642349243164,
"ymin": -41.48643112182617
},
"brand": {
"names": {
"common": null,
"primary": "Starbucks",
"rules": null
},
"wikidata": null
},
"categories": {
"alternate": null,
"primary": "cafe"
},
"confidence": 0.32347911067676516,
"emails": null,
"geometry": "POINT (-72.90164 -41.48643)",
"id": "72da3584-2530-4bac-b062-124393ecdb0c",
"names": {
"common": null,
"primary": "Starbucks Chile",
"rules": null
},
"operating_status": "open",
"phones": "[+56232621864]",
"socials": "['https://www.facebook.com/299687346558972']",
"sources": [
{
"between": null,
"confidence": 0.3234791106767652,
"dataset": "meta",
"license": "CDLA-Permissive-2.0",
"property": "",
"record_id": "299687346558972",
"update_time": "2025-12-01T08:00:00.000Z"
},
{
"between": null,
"confidence": null,
"dataset": "Overture",
"license": "CDLA-Permissive-2.0",
"property": "/properties/confidence",
"record_id": null,
"update_time": "2026-01-13T21:57:48Z"
}
],
"taxonomy": {
"alternates": null,
"hierarchy": [
"food_and_drink",
"casual_eatery",
"cafe"
],
"primary": "cafe"
},
"theme": "places",
"type": "place",
"version": 5,
"websites": null
}
]
Below is a breakdown of how much space each column takes up relative to the others. This dataset only uses point geometry so its geometry field footprint is much smaller than what youβd find in datasets with linestrings and polygons.
The columns using up more space tend to have fewer NULL values and overall better coverage. The exception to this would be something like the basic category column which has ~400 unique values and compresses very well.
$ python3 ~/pqview/main.py \
types \
--html \
places/part-00000-faac29ab-3031-41b7-836a-26e1193347c0-c000.zstd.parquet \
> places.types.html
Below are the column types, NULL-value ratios, number of unique values and the lowest and highest value from each field.
$ ~/duckdb
SELECT column_name,
column_type[:30],
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'places/part*.parquet')
ORDER BY 1;
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β column_name β column_type[:30] β null_percentage β approx_unique β min[:30] β max[:30] β
β varchar β varchar β decimal(9,2) β int64 β varchar β varchar β
ββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ€
β addresses β STRUCT(freeform VARCHAR, local β 0.00 β 45527236 β [{'freeform': '', 'locality': β [{'freeform': NULL, 'locality' β
β basic_category β VARCHAR β 0.00 β 405 β accommodation β zoo β
β bbox β STRUCT(xmin FLOAT, xmax FLOAT, β 0.00 β 68675209 β {'xmin': -180.0, 'xmax': -180. β {'xmin': 180.0, 'xmax': 180.0, β
β brand β STRUCT(wikidata VARCHAR, "name β 93.56 β 89854 β {'wikidata': Q100146251, 'name β {'wikidata': NULL, 'names': {' β
β categories β STRUCT("primary" VARCHAR, alte β 0.00 β 2625563 β {'primary': 3d_printing_servic β {'primary': zoo, 'alternate': β
β confidence β DOUBLE β 0.00 β 222625 β 0.0 β 1.0 β
β emails β VARCHAR[] β 97.32 β 1494892 β [''] β [εηΎ½090-6252-0042kakaricho@char β
β geometry β GEOMETRY β 0.00 β 65426272 β POINT (2 28) β POINT (-6.240234374999999 53.3 β
β id β VARCHAR β 0.00 β 67893339 β 00000057-00a2-439b-9581-cf9e24 β ffffffe2-6947-4705-ad9b-936fab β
β names β STRUCT("primary" VARCHAR, comm β 0.00 β 64313486 β {'primary': \bμμ± A, 'common': N β {'primary': π©»Έιε ζΊͺθ§ιζ₯ι, 'common' β
β operating_status β VARCHAR β 0.00 β 3 β closed β temporarily closed β
β phones β VARCHAR[] β 16.58 β 54450308 β [''] β [β+34 951 51 92 20] β
β socials β VARCHAR[] β 15.59 β 63756430 β [' https://www.facebook.com/Jo β [www.motrio.com] β
β sources β STRUCT(property VARCHAR, datas β 0.00 β 65629291 β [{'property': '', 'dataset': A β [{'property': '', 'dataset': m β
β taxonomy β STRUCT("primary" VARCHAR, hier β 0.00 β 1718 β {'primary': 3d_printing_servic β {'primary': zoo, 'hierarchy': β
β version β INTEGER β 0.00 β 9 β 1 β 8 β
β websites β VARCHAR[] β 39.07 β 39341765 β [''] β ['ΰΉhttp://www.krungsri.com'] β
ββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββ€
β 17 rows 6 columns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Sources
Overture has 8 outside sources they collect data from and combine to build this dataset.
$ ~/duckdb
SELECT COUNT(*),
a.dataset
FROM (
SELECT UNNEST(sources) a
FROM 'places/part*.parquet'
)
GROUP BY 2
ORDER BY 1 DESC;
ββββββββββββββββ¬βββββββββββββββ
β count_star() β dataset β
β int64 β varchar β
ββββββββββββββββΌβββββββββββββββ€
β 72444739 β Overture β
β 58958274 β meta β
β 6003562 β Foursquare β
β 5644707 β Microsoft β
β 1554249 β AllThePlaces β
β 148042 β DAC β
β 127862 β PinMeTo β
β 4844 β RenderSEO β
β 3199 β Krick β
ββββββββββββββββ΄βββββββββββββββ
In researching the team behind Places, I came across this LinkedIn post from Dana Bauer at Overture where she was looking for someone to join TomTom, one of the Overture Foundationβs partner firms, to help her team work on this dataset.
Categories
There are 398 unique basic categories in this dataset.
$ ~/duckdb
SELECT COUNT(DISTINCT basic_category)
FROM 'places/part*.parquet';
398
Below are the top 200 basic categories.
.maxrows 200
SELECT COUNT(*),
basic_category
FROM 'places/part*.parquet'
GROUP BY 2
ORDER BY 1 DESC
LIMIT 100;
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β count_star() β basic_category β
β int64 β varchar β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββ€
β 4809112 β restaurant β
β 2941235 β specialty_store β
β 1816052 β beauty_salon β
β 1695081 β clothing_store β
β 1428417 β place_of_learning β
β 1339376 β service_location β
β 1302723 β hotel β
β 1227198 β christian_place_of_worshop β
β 1180049 β retail_location β
β 1155375 β auto_repair_service β
β 1154927 β grocery_store β
β 1117086 β financial_service β
β 1116707 β healthcare_location β
β 1062436 β social_or_community_service β
β 1019714 β historic_site β
β 919867 β bar β
β 856064 β event_or_party_service β
β 828329 β b2b_supplier_distributor β
β 793373 β electronics_store β
β 784525 β real_estate_service β
β 735823 β cafe β
β 686114 β coffee_shop β
β 665004 β general_dentistry β
β 659636 β gas_station β
β 647944 β civic_organization_office β
β 636551 β casual_eatery β
β 631680 β real_estate_agency β
β 628708 β home_service β
β 589904 β gym β
β 587477 β accommodation β
β 570939 β hair_salon β
β 558413 β hospital β
β 554701 β car_dealer β
β 546472 β pharmacy β
β 545813 β nature_outdoors β
β 542841 β specialty_school β
β 536649 β bakery β
β 528118 β park β
β 525704 β convenience_store β
β 498561 β business_advertising_marketing β
β 486786 β fast_food_restaurant β
β 478687 β elementary_school β
β 475380 β building_contractor_service β
β 459196 β flower_shop β
β 458414 β pizzaria β
β 443957 β travel_service β
β 434095 β insurance_agency β
β 430175 β vehicle_service β
β 423339 β religious_organization β
β 413390 β college_university β
β 411364 β doctors_office β
β 405059 β b2b_service β
β 396944 β entertainment_location β
β 393056 β atm β
β 388623 β transportation_location β
β 387077 β barber_shop β
β 383324 β manufacturer β
β 380192 β government_office β
β 374121 β attorney_or_law_firm β
β 357145 β sporting_goods_store β
β 347795 β printing_service β
β 325030 β building_construction_service β
β 316876 β alternative_medicine β
β 311356 β shipping_delivery_service β
β 310737 β sport_fitness_facility β
β 308497 β fashion_or_apparel_store β
β 305517 β spa β
β 304559 β bank β
β 295161 β jewelry_store β
β 281592 β farm β
β 276654 β preschool β
β 275556 β hardware_store β
β 270869 β shoe_store β
β 261106 β art_craft_hobby_store β
β 258718 β post_office β
β 255981 β sport_recreation_club β
β 236171 β technical_service β
β 228379 β music_venue β
β 217210 β nail_salon β
β 206289 β physical_therapy β
β 203111 β private_lodging β
β 203106 β veterinarian β
β 201486 β shopping_mall β
β 199644 β personal_service β
β 198236 β eyewear_store β
β 194648 β recreational_location β
β 194297 β eating_drinking_location β
β 190095 β pet_store β
β 187244 β tattoo_or_piercing_salon β
β 186509 β media_service β
β 182239 β pub β
β 179089 β bookstore β
β 178037 β corporate_or_business_office β
β 176928 β high_school β
β 176335 β beach β
β 173075 β mental_health β
β 170891 β landscaping_gardening_service β
β 170403 β bed_and_breakfast β
β 170397 β legal_service β
β 166948 β liquor_store β
ββββββββββββββββ΄βββββββββββββββββββββββββββββββββ€
β 100 rows 2 columns β
βββββββββββββββββββββββββββββββββββββββββββββββββ
Below, Iβve built a map showing the most common category in every H3 zoom-level 3 hexagon across the planet.
CREATE OR REPLACE TABLE h3_5s AS
WITH b AS (
WITH a AS (
SELECT H3_LATLNG_TO_CELL(bbox.ymin,
bbox.xmin,
3) h3_5,
basic_category,
COUNT(*) num_recs
FROM 'places/part*.parquet'
GROUP BY 1, 2
)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY h3_5
ORDER BY num_recs DESC) AS rn
FROM a
)
FROM b
WHERE rn = 1
ORDER BY num_recs DESC;
COPY (
SELECT geometry: H3_CELL_TO_BOUNDARY_WKT(h3_5)::GEOMETRY,
basic_category
FROM h3_5s
WHERE ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
AND ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'basic_category.h3_3_stats.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
QGIS rendered strange things when I tried to change the projection to a US-centric one so Iβve had to stick with EPSG:4326 below. Iβve added Natural Earthβs Admin-0 boundaries to help outline country borders.
QGISβ Globe View wouldnβt render labels but does show the contiguous nature of the most common categories over land.
This dataset has a lot of POIs over open oceans and seas as well. The most common categories for those POIs are much more diverse than for those over land.
Below are the most common categories across Europe.
Below are the most common categories across North Africa, the Middle East, India and South East Asia.
Operating Status
I suspect the operating status data is in its early stages. A lot of businesses collapse in the first year and I expect in future releases that the closed count will grow from where it is now.
$ ~/duckdb
SELECT COUNT(*),
operating_status
FROM 'places/part*.parquet'
GROUP BY 2
ORDER BY 1 DESC;
ββββββββββββββββ¬βββββββββββββββββββββ
β count_star() β operating_status β
β int64 β varchar β
ββββββββββββββββΌβββββββββββββββββββββ€
β 72443934 β open β
β 785 β closed β
β 20 β temporarily closed β
ββββββββββββββββ΄βββββββββββββββββββββ
Version
Records contain version counts. Version 5 is the most common in this dataset at the moment.
$ ~/duckdb
CREATE OR REPLACE TABLE h3_5s AS
WITH b AS (
WITH a AS (
SELECT H3_LATLNG_TO_CELL(bbox.ymin,
bbox.xmin,
5) h3_5,
version,
COUNT(*) num_recs
FROM 'places/part*.parquet'
GROUP BY 1, 2
)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY h3_5
ORDER BY num_recs DESC) AS rn
FROM a
)
FROM b
WHERE rn = 1
ORDER BY num_recs DESC;
COPY (
SELECT geometry: H3_CELL_TO_BOUNDARY_WKT(h3_5)::GEOMETRY,
version
FROM h3_5s
WHERE ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
AND ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'version.h3_5_stats.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
SELECT version,
COUNT(*)
FROM 'version.h3_5_stats.parquet'
GROUP BY 1
ORDER BY 1;
βββββββββββ¬βββββββββββββββ
β version β count_star() β
β int32 β int64 β
βββββββββββΌβββββββββββββββ€
β 1 β 645 β
β 2 β 12272 β
β 3 β 28228 β
β 4 β 31962 β
β 5 β 137824 β
β 6 β 80749 β
β 7 β 7962 β
β 8 β 412 β
βββββββββββ΄βββββββββββββββ
Brand Names
Of the 72M+ records, ~4M contain a brand name. Below are the most common ones.
$ ~/duckdb
SELECT COUNT(*),
brand.names.primary
FROM 'places/part*.parquet'
GROUP BY 2
ORDER BY 1 DESC
LIMIT 25;
ββββββββββββββββ¬ββββββββββββββββββββββββββββββ
β count_star() β primary β
β int64 β varchar β
ββββββββββββββββΌββββββββββββββββββββββββββββββ€
β 68001883 β NULL β
β 57163 β Citibank β
β 56977 β Wildberries β
β 45430 β Western Union β
β 42517 β Shell β
β 42036 β McDonald's β
β 33750 β Amazon Locker β
β 33117 β γ΅γ³γγͺγΌ β
β 31789 β Subway β
β 30119 β Starbucks β
β 26846 β 7-Eleven β
β 26509 β LibertyX Bitcoin ATM β
β 20088 β KFC β
β 19844 β Dollar General β
β 18152 β Burger King β
β 16320 β Pizza Hut β
β 15583 β InPost β
β 15035 β Domino's Pizza β
β 14169 β Indian Oil Corporation Ltd. β
β 13487 β Enterprise β
β 13422 β OXXO β
β 12769 β Lidl β
β 12719 β bp β
β 12366 β Honda β
β 12251 β TotalEnergies β
ββββββββββββββββ΄ββββββββββββββββββββββββββββββ€
β 25 rows 2 columns β
ββββββββββββββββββββββββββββββββββββββββββββββ
There are 44K McDonaldβs locations in the world so to have 42K is pretty good coverage.
Taxonomy & Hierarchy
Below are the most common values for the new taxonomy.hierarchy field that was added late last year.
$ ~/duckdb
SELECT COUNT(*),
taxonomy.hierarchy
FROM 'places/part*.parquet'
GROUP BY 2
ORDER BY 1 DESC
LIMIT 25;
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β count_star() β hierarchy β
β int64 β varchar[] β
ββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2011362 β [food_and_drink, restaurant] β
β 1393271 β [lifestyle_services, beauty_service, beauty_salon] β
β 1339177 β [services_and_business, professional_service] β
β 1302723 β [lodging, hotel] β
β 1141151 β [shopping] β
β 1027893 β [shopping, food_and_beverage_store, grocery_store] β
β 1023589 β [shopping, fashion_and_apparel_store, clothing_store] β
β 1015254 β [cultural_and_historic, architectural_landmark] β
β 926424 β [cultural_and_historic, religious_organization, place_of_worship, christian_place_of_worshop] β
β 895567 β [travel_and_transportation, automotive_and_ground_transport, automotive, automotive_services_and_repair, automotive_repair] β
β 808932 β [services_and_business, professional_service, event_planning] β
β 730509 β [food_and_drink, bar] β
β 715677 β [education, school] β
β 711235 β [food_and_drink, casual_eatery, cafe] β
β 706174 β [services_and_business, real_estate] β
β 703978 β [community_and_government, community_service] β
β 686114 β [food_and_drink, beverage_shop, coffee_shop] β
β 652453 β [travel_and_transportation, automotive_and_ground_transport, fueling_station, gas_station] β
β 631680 β [services_and_business, real_estate, real_estate_agent] β
β 589904 β [sports_and_recreation, sports_and_recreation_venue, gym] β
β 578091 β [health_care, dentist] β
β 558413 β [health_care, hospital] β
β 553915 β [lifestyle_services, beauty_service, hair_salon] β
β 546472 β [shopping, specialty_store, pharmacy] β
β 535707 β [food_and_drink, casual_eatery, bakery] β
ββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 25 rows 2 columns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Maritime POIs
There are a large number of POIs that are over open ocean and seas in this post. Iβm not sure how many of these have been placed in error.
Marine Regions has GeoPackage files delineating maritime boundaries. Seven of the nine files contain polygons that Iβll use to determine the waters a given POI is in.
Below are the GeoPackage file sizes.
$ ls -lh ~/marineregions_org/*.gpkg
.. 69M .. eez_12nm_v4.gpkg
.. 60M .. eez_24nm_v4.gpkg
.. 77M .. eez_internal_waters_v4.gpkg
.. 16M .. eez_boundaries_v12.gpkg
.. 157M .. eez_v12.gpkg
.. 39M .. eez_archipelagic_waters_v4.gpkg
.. 2.1M .. ecs_boundaries_v01.gpkg
.. 7.0M .. ecs_v01.gpkg
.. 8.8M .. High_Seas_v1.gpkg
Below is a rendering of this datasetβs geometry and names for the Baltic Sea.
Iβll import the GeoPackage files into a DuckDB table.
$ echo "CREATE OR REPLACE TABLE marineregions (
name VARCHAR,
geom GEOMETRY);" \
| ~/duckdb places.duckdb
$ for FILENAME in ~/marineregions_org/High_Seas_v1.gpkg; do
echo "INSERT INTO marineregions
SELECT name,
geom
FROM ST_READ('$FILENAME');" \
| ~/duckdb places.duckdb
done
$ for FILENAME in ~/marineregions_org/{ecs_v01,eez_12nm_v4,eez_24nm_v4,eez_archipelagic_waters_v4,eez_internal_waters_v4,eez_v12}*.gpkg; do
echo $FILENAME
echo "INSERT INTO marineregions
SELECT name: GEONAME,
geom
FROM ST_READ('$FILENAME');" \
| ~/duckdb places.duckdb
done
There are 1,006 records in the Marine Regions table in DuckDB. Iβll group Overtureβs 72M+ records into ~914K hexagons. This will give me a quick and rough estimate of how many POIs could be over open oceans and seas.
$ ~/duckdb places.duckdb
CREATE OR REPLACE TABLE h3_stats AS
SELECT hexagon:
H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin,
6) ,
num_pois: COUNT(*)
FROM 'places/part*.parquet'
GROUP BY 1;
SELECT SUM(num_pois)
FROM h3_stats a
JOIN marineregions b
ON ST_Overlaps(H3_CELL_TO_BOUNDARY_WKT(a.hexagon)::GEOMETRY,
b.geom)
WHERE b.name IS NOT NULL;
βββββββββββββββββββ
β sum(num_pois) β
β int128 β
βββββββββββββββββββ€
β 29590753 β
β (29.59 million) β
βββββββββββββββββββ
The above shows over 29M+ POIs over open oceans and seas. This number wonβt be super exact. Some hexagons could cover both bodies of water along with a crowded coastline. Also, a cafe on a beach might have its position located in the neighbouring body of water due to inaccuracies in GPS. Nonetheless, 29M shouldnβt be too far from an accurate figure.
For reference, this is what the 1,006 Marine Regions polygons look like rendered on a map. They cover just about every major body of water but also have a wide buffer with Antarctica.
COPY (
SELECT geom,
name
FROM marineregions
) TO 'marineregions.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Confidence
Below are the most common confidence values across this dataset. The brighter the hexagon the higher the confidence.
$ ~/duckdb
SELECT COUNT(*),
confidence: (confidence*10)::INT / 10
FROM 'places/part*.parquet'
GROUP BY 2
ORDER BY 2;
ββββββββββββββββ¬βββββββββββββ
β count_star() β confidence β
β int64 β double β
ββββββββββββββββΌβββββββββββββ€
β 4083 β 0.0 β
β 4149 β 0.1 β
β 3624 β 0.2 β
β 11248354 β 0.3 β
β 1492960 β 0.4 β
β 7243 β 0.5 β
β 11360452 β 0.6 β
β 7405 β 0.7 β
β 10562751 β 0.8 β
β 1522626 β 0.9 β
β 36231092 β 1.0 β
ββββββββββββββββ΄βββββββββββββ€
β 11 rows 2 columns β
βββββββββββββββββββββββββββββ
Below, Iβll build a map of the most common confidence value for each H3 zoom-level 5 hexagon.
CREATE OR REPLACE TABLE h3_5s AS
WITH b AS (
WITH a AS (
SELECT H3_LATLNG_TO_CELL(bbox.ymin,
bbox.xmin,
5) h3_5,
confidence: (confidence*10)::INT / 10,
COUNT(*) num_recs
FROM 'places/part*.parquet'
GROUP BY 1, 2
)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY h3_5
ORDER BY num_recs DESC) AS rn
FROM a
)
FROM b
WHERE rn = 1
ORDER BY num_recs DESC;
COPY (
SELECT geometry: H3_CELL_TO_BOUNDARY_WKT(h3_5)::GEOMETRY,
confidence
FROM h3_5s
WHERE ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
AND ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'confidence.h3_5_stats.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Below, Iβll use larger hexagons to make generalising the confidence scores over open oceans easier.
CREATE OR REPLACE TABLE h3_3s AS
WITH b AS (
WITH a AS (
SELECT H3_LATLNG_TO_CELL(bbox.ymin,
bbox.xmin,
3) h3_3,
confidence: (confidence*10)::INT / 10,
COUNT(*) num_recs
FROM 'places/part*.parquet'
GROUP BY 1, 2
)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY h3_3
ORDER BY num_recs DESC) AS rn
FROM a
)
FROM b
WHERE rn = 1
ORDER BY num_recs DESC;
COPY (
SELECT geometry: H3_CELL_TO_BOUNDARY_WKT(h3_3)::GEOMETRY,
confidence
FROM h3_3s
WHERE ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
AND ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'confidence.h3_3_stats.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
The most common confidence value over the oceans and seas is 0.6. This matches most of Africanβs landmass as well as that of the Middle Eastβs and Chinaβs.
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If youβd like to discuss how my offerings can help your business please contact me via LinkedIn.