Millions of Locations for Thousands of Brands

All The Places has a set of spiders and scrapers that extract location information from thousands of brands’ websites.

Their GitHub repo is made up of 145K lines of Python with commits going back ten years. There have been a total of 162 contributors to date.

Their crawlers run weekly and the collected data is published shortly afterwards.

In this post, I’ll examine their latest release.

My Workstation

I’m using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.

The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a …

All The Places has a set of spiders and scrapers that extract location information from thousands of brands’ websites.

Their GitHub repo is made up of 145K lines of Python with commits going back ten years. There have been a total of 162 contributors to date.

Their crawlers run weekly and the collected data is published shortly afterwards.

In this post, I’ll examine their latest release.

My Workstation

The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system’s C drive.

The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.

I’m running Ubuntu 24 LTS via Microsoft’s Ubuntu for Windows on Windows 11 Pro. In case you’re wondering why I don’t run a Linux-based desktop as my primary work environment, I’m still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.

Installing Prerequisites

I’ll use Python 3.12.3 and a few other tools to help analyse the data in this post.

$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt update
$ sudo apt install \
jq \
python3-pip \
python3.12-venv

I’ll set up a Python Virtual Environment.

$ python3 -m venv ~/.atp
$ source ~/.atp/bin/activate

I’ll use DuckDB v1.4.1, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb

INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;

I’ll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc

.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;

The maps in this post were rendered using QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.

I used QGIS’ Tile+ plugin to add basemaps from Bing to the maps in this post.

Thousands of GeoJSON Files

The following will download a 1.4 GB ZIP file that contains 4,367 GeoJSON files totalling 14 GB when uncompressed.

$ mkdir -p ~/atp
$ cd ~/atp

$ wget -c https://alltheplaces-data.openaddresses.io/runs/2025-11-29-13-32-39/output.zip
$ unzip -j output.zip

The GeoJSON files are built by individual Python scripts which often can be country- or region-specific depending on the brand’s online structure. Below are the files for Nando’s.

$ ls -l nandos*

122771 .. nandos.geojson
16497 .. nandos_ae.geojson
6822 .. nandos_bh.geojson
16634 .. nandos_bw.geojson
555223 .. nandos_gb_ie.geojson
17450 .. nandos_in.geojson
3904 .. nandos_mu.geojson
39668 .. nandos_my.geojson
3663 .. nandos_om.geojson
10247 .. nandos_qa.geojson
3740 .. nandos_sg.geojson
0 .. nandos_us.geojson
308489 .. nandos_za.geojson
14091 .. nandos_zm.geojson
15739 .. nandos_zw.geojson

Roughly 500 of the 1,200+ Nando’s locations are in either the UK or the Republic of Ireland. Below is an example record from their gb_ie file.

$ echo "FROM  ST_READ('nandos_gb_ie.geojson')
LIMIT 1" \
| ~/duckdb -json \
| jq -S .

[
{
"@source_uri": "https://www.nandos.co.uk/restaurants/yate",
"@spider": "nandos_gb_ie",
"addr:city": "Yate, Gloucestershire",
"addr:country": "GB",
"addr:postcode": "BS37 4AS",
"addr:state": "England",
"addr:street_address": "Unit R2, Yate Shopping Centre, Link Rd",
"amenity": "restaurant",
"branch": "Yate",
"brand": "Nando's",
"brand:wikidata": "Q3472954",
"contact:facebook": "https://www.facebook.com/Nandos.UnitedKingdom",
"contact:twitter": "NandosUK",
"cuisine": "chicken;portuguese",
"geom": "POINT (-2.408309 51.539583)",
"id": "K8QmVBoxykViF1_30Vo-VcOrrto=",
"image": "https://www.nandos.co.uk/replatform/restaurants/img/default-restaurant-hero-background.png",
"name": "Nando's",
"nsi_id": "nandos-32ebee",
"opening_hours": "Mo-Su 11:30-22:00",
"payment:cash": "yes",
"payment:credit_cards": "yes",
"payment:debit_cards": "yes",
"phone": "+44 1454 312504",
"ref": "https://www.nandos.co.uk/restaurants/yate#restaurant",
"website": "https://www.nandos.co.uk/restaurants/yate"
}
]

Unique Brands

Below, I tried to get a rough idea of how many brands are represented in this release.

There were a few hundred brands with zero-byte GeoJSON files that I excluded. Many files ended with a single 2-digit country identifier; others, like the above Nando’s example, had more than one. For the sake of this post’s timebox, I accepted that margin of error.

$ python3

from   glob import glob
import os


names = [x.split('.')[0]
for x in glob('*.geojson')
if os.path.getsize(x)]
brands = set()

for name in names:
# WIP: nandos_gb_ie won't get handled properly.
if len(name.split('_')[-1]) == 2:
brands.add('_'.join(name.split('_')[:1]))
else:
brands.add(name)

Without the zero-byte filter, there are 3,070 unique brands, but with the filter, there are 2,716.

There were over 900 open issues in the GitHub repository at the time of this writing. I didn’t uncover many common themes between issues outside of Cloudflare becoming more popular among brands, while being unfriendly with their crawlers.

Perhaps it’s possible that those empty GeoJSON files could have their datasets repopulated at some point down the line.

The Nando’s GeoJSON file for the US in this release was zero bytes in size. The site itself is hosted directly from AWS. I suspect either the site has had a refresh or there was a network failure when the crawler ran.

I’m sure the crawler has functioned properly in the past. Publishing the last successful crawl’s dataset instead of the latest, and potentially empty, dataset would be a good stopgap.

GeoJSON to Parquet

There already is a Parquet distribution, but it has been down since at least October. I’ll run 24 concurrent processes that will convert the GeoJSON files into Parquet.

$ ls *.geojson \
| xargs -P24 \
-I% \
bash -c "
BASENAME=\`echo \"%\" | cut -f1 -d.\`
echo \"Building \$BASENAME.\"
echo \"COPY (FROM ST_READ('%'))
TO '\$BASENAME.parquet' (
FORMAT 'PARQUET',
CODEC  'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);\" \
| ~/duckdb"

The following will merge the Parquet files into a single file.

$ ~/duckdb

COPY(
SELECT   name,
geometry: geom,
bbox:     {'xmin': ST_XMIN(ST_EXTENT(geom)),
'ymin': ST_YMIN(ST_EXTENT(geom)),
'xmax': ST_XMAX(ST_EXTENT(geom)),
'ymax': ST_YMAX(ST_EXTENT(geom))},
* EXCLUDE(name, geom)
FROM     READ_PARQUET('*.parquet',
union_by_name=True,
filename=True)
WHERE    ST_Y(ST_CENTROID(geom)) IS NOT NULL
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geom)),
ST_X(ST_CENTROID(geom))]::double[2])
) TO '~/atp.parquet' (
FORMAT 'PARQUET',
CODEC  'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);

The resulting Parquet file is 1.4 GB and contains 19,059,494 records.

Locations Heatmap

Below is a heatmap of the locations in this release. The brighter hexagons have more locations.

$ ~/duckdb

CREATE OR REPLACE TABLE h3_3_stats AS
SELECT   h3_3: H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin,
3),
num_locations: COUNT(*)
FROM     '~/atp.parquet'
GROUP BY 1;

COPY (
SELECT geometry: ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_3)::geometry),
num_locations
FROM   h3_3_stats
WHERE  ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
AND    ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO '~/atp.h3_3_stats.parquet' (
FORMAT 'PARQUET',
CODEC  'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);

There are a lot of locations located in bodies of water and all across Antarctica.

These are the most common brand names.

SELECT   COUNT(*),
name
FROM     '~/atp.parquet'
WHERE    LENGTH(name)
GROUP BY 2
ORDER BY 1 DESC
LIMIT    40;

┌──────────────┬──────────────────────────────────┐
│ count_star() │               name               │
│    int64     │             varchar              │
├──────────────┼──────────────────────────────────┤
│        68227 │ Wildberries                      │
│        55171 │ サントリー                       │
│        39731 │ Shell                            │
│        26767 │ 7-Eleven                         │
│        24920 │ Coinstar                         │
│        23271 │ McDonald's                       │
│        21495 │ Starbucks                        │
│        20504 │ Dollar General                   │
│        20380 │ Punjab National Bank             │
│        19519 │ FedEx Drop Box                   │
│        13760 │ Walgreens                        │
│        13660 │ HDFC Bank ATM                    │
│        13346 │ Burger King                      │
│        13050 │ Circle K                         │
│        13028 │ Chevron                          │
│        13006 │ KFC                              │
│        12670 │ FedEx OnSite                     │
│        12279 │ Żabka                            │
│        12074 │ InPost                           │
│        11652 │ TotalEnergies                    │
│        10993 │ HDFC Bank                        │
│        10772 │ Aldi                             │
│        10612 │ Şok                              │
│         9993 │ BP                               │
│         9993 │ Lidl                             │
│         9435 │ Dunkin'                          │
│         9292 │ Dollar Tree                      │
│         9080 │ Bitcoin Depot                    │
│         8568 │ ICICI Bank                       │
│         8424 │ Taco Bell                        │
│         8395 │ 星巴克                           │
│         8193 │ Bank of Baroda                   │
│         7751 │ Verizon                          │
│         7700 │ CVS Pharmacy                     │
│         7677 │ Subway                           │
│         7577 │ T-Mobile                         │
│         7403 │ AutoZone                         │
│         7367 │ H&R Block Tax Preparation Office │
│         7339 │ Family Dollar                    │
│         7318 │ Blue Rhino                       │
├──────────────┴──────────────────────────────────┤
│ 40 rows                               2 columns │
└─────────────────────────────────────────────────┘

There are 2.36M unique strings in the name column. I was expecting this number to be much close to the ~3-4K brands there are scrapers for.

SELECT COUNT(DISTINCT name)
FROM   '~/atp.parquet'; -- 2,358,505

I’ll build a bounding box that surrounds each brand’s locations in each country they’re represented in.

COPY (
SELECT   name,
country: "addr:country",
geometry: {
'min_x': MIN(ST_X(ST_CENTROID(geometry))),
'min_y': MIN(ST_Y(ST_CENTROID(geometry))),
'max_x': MAX(ST_X(ST_CENTROID(geometry))),
'max_y': MAX(ST_Y(ST_CENTROID(geometry)))}::BOX_2D::GEOMETRY
FROM     '~/atp.parquet'
GROUP BY 1, 2
) TO '~/atp.extent_per_country.parquet' (
FORMAT 'PARQUET',
CODEC  'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);

Below are the countries with Nando’s locations in this release. Note, the US locations were missing.

Below are the Starbucks locations.

For some reason, only 11 countries have at least one McDonalds restaurant in this release.

SELECT COUNT(DISTINCT "addr:country")
FROM   '~/atp.parquet'
WHERE  name LIKE '%McDonalds%'; -- 11

There are 44K McDonald’s restaurants in the world but this release only lists 23,271 of them.

Spot Checking

Despite wide IKEA coverage, Estonia’s Flagship store is missing.

There is a Romanian Tree Library just off the coast of one of Estonia’s Islands.

A Finnish fast food chain placed its restaurant outside of an Estonian shopping centre rather than inside.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you’d like to discuss how my offerings can help your business please contact me via LinkedIn.

My Workstation

My Workstation

Installing Prerequisites

Thousands of GeoJSON Files

Unique Brands

GeoJSON to Parquet

Locations Heatmap

Spot Checking

Similar Posts