All The Places has a set of spiders and scrapers that extract location information from thousands of brandsβ websites.
Their GitHub repo is made up of 145K lines of Python with commits going back ten years. There have been a total of 162 contributors to date.
Their crawlers run weekly and the collected data is published shortly afterwards.
In this post, Iβll examine their latest release.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a β¦
All The Places has a set of spiders and scrapers that extract location information from thousands of brandsβ websites.
Their GitHub repo is made up of 145K lines of Python with commits going back ten years. There have been a total of 162 contributors to date.
Their crawlers run weekly and the collected data is published shortly afterwards.
In this post, Iβll examine their latest release.
My Workstation
Iβm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my systemβs C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
Iβm running Ubuntu 24 LTS via Microsoftβs Ubuntu for Windows on Windows 11 Pro. In case youβre wondering why I donβt run a Linux-based desktop as my primary work environment, Iβm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
Iβll use Python 3.12.3 and a few other tools to help analyse the data in this post.
$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt update
$ sudo apt install \
jq \
python3-pip \
python3.12-venv
Iβll set up a Python Virtual Environment.
$ python3 -m venv ~/.atp
$ source ~/.atp/bin/activate
Iβll use DuckDB v1.4.1, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
Iβll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
The maps in this post were rendered using QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGISβ Tile+ plugin to add basemaps from Bing to the maps in this post.
Thousands of GeoJSON Files
The following will download a 1.4 GB ZIP file that contains 4,367 GeoJSON files totalling 14 GB when uncompressed.
$ mkdir -p ~/atp
$ cd ~/atp
$ wget -c https://alltheplaces-data.openaddresses.io/runs/2025-11-29-13-32-39/output.zip
$ unzip -j output.zip
The GeoJSON files are built by individual Python scripts which often can be country- or region-specific depending on the brandβs online structure. Below are the files for Nandoβs.
$ ls -l nandos*
122771 .. nandos.geojson
16497 .. nandos_ae.geojson
6822 .. nandos_bh.geojson
16634 .. nandos_bw.geojson
555223 .. nandos_gb_ie.geojson
17450 .. nandos_in.geojson
3904 .. nandos_mu.geojson
39668 .. nandos_my.geojson
3663 .. nandos_om.geojson
10247 .. nandos_qa.geojson
3740 .. nandos_sg.geojson
0 .. nandos_us.geojson
308489 .. nandos_za.geojson
14091 .. nandos_zm.geojson
15739 .. nandos_zw.geojson
Roughly 500 of the 1,200+ Nandoβs locations are in either the UK or the Republic of Ireland. Below is an example record from their gb_ie file.
$ echo "FROM ST_READ('nandos_gb_ie.geojson')
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"@source_uri": "https://www.nandos.co.uk/restaurants/yate",
"@spider": "nandos_gb_ie",
"addr:city": "Yate, Gloucestershire",
"addr:country": "GB",
"addr:postcode": "BS37 4AS",
"addr:state": "England",
"addr:street_address": "Unit R2, Yate Shopping Centre, Link Rd",
"amenity": "restaurant",
"branch": "Yate",
"brand": "Nando's",
"brand:wikidata": "Q3472954",
"contact:facebook": "https://www.facebook.com/Nandos.UnitedKingdom",
"contact:twitter": "NandosUK",
"cuisine": "chicken;portuguese",
"geom": "POINT (-2.408309 51.539583)",
"id": "K8QmVBoxykViF1_30Vo-VcOrrto=",
"image": "https://www.nandos.co.uk/replatform/restaurants/img/default-restaurant-hero-background.png",
"name": "Nando's",
"nsi_id": "nandos-32ebee",
"opening_hours": "Mo-Su 11:30-22:00",
"payment:cash": "yes",
"payment:credit_cards": "yes",
"payment:debit_cards": "yes",
"phone": "+44 1454 312504",
"ref": "https://www.nandos.co.uk/restaurants/yate#restaurant",
"website": "https://www.nandos.co.uk/restaurants/yate"
}
]
Unique Brands
Below, I tried to get a rough idea of how many brands are represented in this release.
There were a few hundred brands with zero-byte GeoJSON files that I excluded. Many files ended with a single 2-digit country identifier; others, like the above Nandoβs example, had more than one. For the sake of this postβs timebox, I accepted that margin of error.
$ python3
from glob import glob
import os
names = [x.split('.')[0]
for x in glob('*.geojson')
if os.path.getsize(x)]
brands = set()
for name in names:
# WIP: nandos_gb_ie won't get handled properly.
if len(name.split('_')[-1]) == 2:
brands.add('_'.join(name.split('_')[:1]))
else:
brands.add(name)
Without the zero-byte filter, there are 3,070 unique brands, but with the filter, there are 2,716.
There were over 900 open issues in the GitHub repository at the time of this writing. I didnβt uncover many common themes between issues outside of Cloudflare becoming more popular among brands, while being unfriendly with their crawlers.
Perhaps itβs possible that those empty GeoJSON files could have their datasets repopulated at some point down the line.
The Nandoβs GeoJSON file for the US in this release was zero bytes in size. The site itself is hosted directly from AWS. I suspect either the site has had a refresh or there was a network failure when the crawler ran.
Iβm sure the crawler has functioned properly in the past. Publishing the last successful crawlβs dataset instead of the latest, and potentially empty, dataset would be a good stopgap.
GeoJSON to Parquet
There already is a Parquet distribution, but it has been down since at least October. Iβll run 24 concurrent processes that will convert the GeoJSON files into Parquet.
$ ls *.geojson \
| xargs -P24 \
-I% \
bash -c "
BASENAME=\`echo \"%\" | cut -f1 -d.\`
echo \"Building \$BASENAME.\"
echo \"COPY (FROM ST_READ('%'))
TO '\$BASENAME.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);\" \
| ~/duckdb"
The following will merge the Parquet files into a single file.
$ ~/duckdb
COPY(
SELECT name,
geometry: geom,
bbox: {'xmin': ST_XMIN(ST_EXTENT(geom)),
'ymin': ST_YMIN(ST_EXTENT(geom)),
'xmax': ST_XMAX(ST_EXTENT(geom)),
'ymax': ST_YMAX(ST_EXTENT(geom))},
* EXCLUDE(name, geom)
FROM READ_PARQUET('*.parquet',
union_by_name=True,
filename=True)
WHERE ST_Y(ST_CENTROID(geom)) IS NOT NULL
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geom)),
ST_X(ST_CENTROID(geom))]::double[2])
) TO '~/atp.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
The resulting Parquet file is 1.4 GB and contains 19,059,494 records.
Locations Heatmap
Below is a heatmap of the locations in this release. The brighter hexagons have more locations.
$ ~/duckdb
CREATE OR REPLACE TABLE h3_3_stats AS
SELECT h3_3: H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin,
3),
num_locations: COUNT(*)
FROM '~/atp.parquet'
GROUP BY 1;
COPY (
SELECT geometry: ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_3)::geometry),
num_locations
FROM h3_3_stats
WHERE ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
AND ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO '~/atp.h3_3_stats.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
There are a lot of locations located in bodies of water and all across Antarctica.
These are the most common brand names.
SELECT COUNT(*),
name
FROM '~/atp.parquet'
WHERE LENGTH(name)
GROUP BY 2
ORDER BY 1 DESC
LIMIT 40;
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β count_star() β name β
β int64 β varchar β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββ€
β 68227 β Wildberries β
β 55171 β γ΅γ³γγͺγΌ β
β 39731 β Shell β
β 26767 β 7-Eleven β
β 24920 β Coinstar β
β 23271 β McDonald's β
β 21495 β Starbucks β
β 20504 β Dollar General β
β 20380 β Punjab National Bank β
β 19519 β FedEx Drop Box β
β 13760 β Walgreens β
β 13660 β HDFC Bank ATM β
β 13346 β Burger King β
β 13050 β Circle K β
β 13028 β Chevron β
β 13006 β KFC β
β 12670 β FedEx OnSite β
β 12279 β Ε»abka β
β 12074 β InPost β
β 11652 β TotalEnergies β
β 10993 β HDFC Bank β
β 10772 β Aldi β
β 10612 β Εok β
β 9993 β BP β
β 9993 β Lidl β
β 9435 β Dunkin' β
β 9292 β Dollar Tree β
β 9080 β Bitcoin Depot β
β 8568 β ICICI Bank β
β 8424 β Taco Bell β
β 8395 β ζε·΄ε
β
β 8193 β Bank of Baroda β
β 7751 β Verizon β
β 7700 β CVS Pharmacy β
β 7677 β Subway β
β 7577 β T-Mobile β
β 7403 β AutoZone β
β 7367 β H&R Block Tax Preparation Office β
β 7339 β Family Dollar β
β 7318 β Blue Rhino β
ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ€
β 40 rows 2 columns β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
There are 2.36M unique strings in the name column. I was expecting this number to be much close to the ~3-4K brands there are scrapers for.
SELECT COUNT(DISTINCT name)
FROM '~/atp.parquet'; -- 2,358,505
Iβll build a bounding box that surrounds each brandβs locations in each country theyβre represented in.
COPY (
SELECT name,
country: "addr:country",
geometry: {
'min_x': MIN(ST_X(ST_CENTROID(geometry))),
'min_y': MIN(ST_Y(ST_CENTROID(geometry))),
'max_x': MAX(ST_X(ST_CENTROID(geometry))),
'max_y': MAX(ST_Y(ST_CENTROID(geometry)))}::BOX_2D::GEOMETRY
FROM '~/atp.parquet'
GROUP BY 1, 2
) TO '~/atp.extent_per_country.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Below are the countries with Nandoβs locations in this release. Note, the US locations were missing.
Below are the Starbucks locations.
For some reason, only 11 countries have at least one McDonalds restaurant in this release.
SELECT COUNT(DISTINCT "addr:country")
FROM '~/atp.parquet'
WHERE name LIKE '%McDonalds%'; -- 11
There are 44K McDonaldβs restaurants in the world but this release only lists 23,271 of them.
Spot Checking
Despite wide IKEA coverage, Estoniaβs Flagship store is missing.
There is a Romanian Tree Library just off the coast of one of Estoniaβs Islands.
A Finnish fast food chain placed its restaurant outside of an Estonian shopping centre rather than inside.
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If youβd like to discuss how my offerings can help your business please contact me via LinkedIn.