UK Postcode Binary Encoder
A high-performance, ultra-compact binary format for UK postcode geolocation data. Compresses 1.79M postcodes from ~50MB CSV to just 6.2MB binary (88% compression) with 920K+ lookups/second and O(1) exact lookups.
Architecture
The implementation is split into focused, reusable classes:
Core Classes
PostcodeEncoder
- Main orchestrator class that coordinates the encoding processPostcodeNormalizer
- Handles postcode validation, normalization, and inward ID encodingBitmapManager
- Manages 4000-bit presence bitmaps and rank directories for each outwardBinaryFileWriter
- Handles the binary file format layout and writing- **`BitWriteโฆ
UK Postcode Binary Encoder
A high-performance, ultra-compact binary format for UK postcode geolocation data. Compresses 1.79M postcodes from ~50MB CSV to just 6.2MB binary (88% compression) with 920K+ lookups/second and O(1) exact lookups.
Architecture
The implementation is split into focused, reusable classes:
Core Classes
PostcodeEncoder
- Main orchestrator class that coordinates the encoding processPostcodeNormalizer
- Handles postcode validation, normalization, and inward ID encodingBitmapManager
- Manages 4000-bit presence bitmaps and rank directories for each outwardBinaryFileWriter
- Handles the binary file format layout and writingBitWriter
- Utility for bit-packing coordinate deltas
Key Features
- Ultra-Compact: 88% compression ratio (50MB โ 6.2MB) with 7.8/8.0 entropy efficiency
- Blazing Fast: 920,000+ lookups/second with <1ms per lookup latency
- O(1) Lookups: Constant-time exact postcode lookup via bitmap + rank directory + bit extraction
- Efficient Queries: Fast prefix enumeration of all postcodes in an outward (e.g., โST6โ)
- Memory Optimized: <10MB memory footprint, designed for direct memory mapping
- High Precision: 1e-4ยฐ coordinate precision with <1m average error (0.35m typical)
- Type Safe: Full TypeScript implementation with comprehensive error handling
Usage
Encoding (CSV to Binary)
import { PostcodeEncoder } from "./src/PostcodeEncoder";
const encoder = new PostcodeEncoder("postcodes.csv");
encoder.build("postcodes.bin");
Querying (Binary Database)
import * as fs from "fs";
import { PostcodeClient } from "./src/PostcodeClient";
// Load the binary database
const buffer = fs.readFileSync("postcodes.pcod");
const client = new PostcodeClient(buffer);
// Exact postcode lookup (O(1))
const result = client.lookup("ST6 5AA");
if (result) {
console.log(`${result.postcode}: ${result.lat}, ${result.lon}`);
}
// Enumerate all postcodes in an outward (prefix query)
const postcodes = client.enumerateOutward("ST6");
console.log(`Found ${postcodes.length} postcodes in ST6`);
// Get database statistics
const stats = client.getStats();
console.log(
`Database: ${stats.totalPostcodes} postcodes, ${stats.fileSize} bytes`
);
// Check if postcode exists
const isValid = client.isValidPostcode("M1 1AA");
// Find outwards by prefix
const manchesterOutwards = client.findNearbyOutwards("M");
Dataset
This project uses the ONS Postcode Database that can be downlaod from https://geoportal.statistics.gov.uk/datasets/e14b1475ecf74b58804cf667b6740706, Once you have downloaded the dataset you can put the full ONS_{month}_{year}_UK.csv
file in the data/ directory and run the following command to extract just the valid postcodes, lat and long values into a file called postcodes.csv
, this file is what you can then use to build the binary database
Building the Database
- Ensure that you have a file called postcodes.csv in the root directory, it should contain 3 columns only
- Execute the
yarn run build
command, you should see an output like{ totalOutwards: 2943, totalPostcodes: 1790884, fileSize: 6509848 }
and a new file calledpostcodes.pcod
apear in the root of the directory.
Performance & Compression
Compression Statistics
Metric | Value | Details |
---|---|---|
Input Size | ~50MB | Raw CSV with 1.79M postcodes |
Output Size | 6.2MB | Binary format (.pcod file) |
Compression Ratio | 88% | 8:1 compression ratio |
Entropy Efficiency | 7.8/8.0 | Near-optimal bit utilization |
Coordinate Precision | 1e-4ยฐ | ~11m resolution at UK latitudes |
Average Error | 0.35m | Quantization error vs source |
Lookup Performance
Operation | Performance | Details |
---|---|---|
Exact Lookup | 920K+ ops/sec | O(1) constant time |
Latency | <1ms | Per lookup on M3 Mac |
Memory Usage | <10MB | Total runtime footprint |
Prefix Query | Variable | Depends on outward size |
Build Time | 1.65s | For 1.79M postcodes |
Real-World Benchmark
Below is the output from yarn ts-node scripts/build-and-test.ts
on a Mac Pro M3, which builds the database and validates every postcode lookup:
๐๏ธ Building postcode database...
โ
Build completed in 1.65s
๐งช Loading database and starting tests...
๐ Testing postcode lookups...
============================================================
POSTCODE DATABASE TEST REPORT
============================================================
๐ PROCESSING STATISTICS:
Total rows processed: 1,790,884
Valid rows: 1,790,884
Invalid rows: 0
Invalid rate: 0.00%
๐ LOOKUP STATISTICS:
Successful lookups: 1,790,884
Failed lookups: 0
Lookup success rate: 100.00%
๐ COORDINATE ACCURACY:
Coordinate mismatches: 0
Accuracy rate: 100.00%
Max coordinate error: 0.66m
Average coordinate error: 0.35m
โฑ๏ธ PERFORMANCE:
Build time: 1.65s
Test time: 1.95s
Total time: 3.59s
Lookup rate: 920,763 lookups/sec
Compression Analysis
The binary format achieves exceptional compression through:
-
Bit-packed coordinate deltas: Variable-width encoding based on sector ranges
-
Adaptive storage modes: Bitmap vs list encoding based on density
-
Quantized coordinates: 1e-4ยฐ precision (sufficient for postcode accuracy)
-
Entropy optimization: 7.8/8.0 bits/byte efficiency (97.5% of theoretical maximum) Entropy Distribution:
-
95.8% of data chunks achieve >7.5 bits/byte entropy
-
Coordinate data sections: 7.8 bits/byte (highly compressed)
-
Index sections: 6.3 bits/byte (structured but efficient)
Comparison with Traditional Approaches
Approach | File Size | Lookup Speed | Memory Usage | Precision |
---|---|---|---|---|
This Binary Format | 6.2MB | 920K ops/sec | <10MB | 0.35m avg |
Raw CSV | 50MB | ~1K ops/sec* | 50MB+ | Source precision |
SQLite Database | ~50MB | ~10K ops/sec* | 50MB+ | Source precision |
JSON Format | ~120MB | ~5K ops/sec* | 120MB+ | Source precision |
Compressed CSV (gzip) | ~30MB | ~500 ops/sec* | 50MB+ | Source precision |
*Estimated performance for linear/indexed searches without specialized optimization
Key Advantages:
- 8ร smaller than raw CSV
- 92ร faster than typical database lookups
- 5ร smaller than compressed alternatives
- Zero dependencies - pure binary format
- Memory mappable - no loading overhead
- Deterministic performance - O(1) guaranteed
File Format
+======================================================================================+
| PCDB v3 FILE |
+======================================================================================+
| HEADER (32 bytes) |
| 0x00 magic: 'PCDB' (char[4]) |
| 0x04 version: u8 (=3) |
| 0x05 flags: u8 (=0) |
| 0x06 outwardCount: u16 |
| 0x08 totalUnitCount: u32 |
| 0x0C latOffset: i32 (global min latInt) |
| 0x10 lonOffset: i32 (global min lonInt) |
| 0x14 reserved[12] (zeros) |
+======================================================================================+
| OUTWARD INDEX TABLE (9 bytes x outwardCount) |
| |
| Entry i (9 bytes): |
| +--------------------------------------+ |
| | outwardCode: char[4] (ASCII, NUL-pad) | |
| | sectorCount: u8 | |
| | sectorIndexOffset: u32 (ABSOLUTE) ---+-----> points to start of Outward Block i |
| +--------------------------------------+ |
| |
+======================================================================================+
| OUTWARD BLOCK i (variable) |
| (begins at index[i].sectorIndexOffset, ABSOLUTE) |
| |
| +--------------------------------------------------------------------------------+ |
| | SECTOR TABLE (14 bytes x sectorCount) | |
| | | |
| | SectorEntry s (14 bytes): | |
| | +-----------------------------------------------------------------------+ | |
| | | sectorNumber : u8 (0..9) | | |
| | | unitCount : u16 | | |
| | | unitsRelOff : u24 (LE) -------------------------------+------------+ | |
| | | (RELATIVE to Outward Block|start) | |
| | | baseLatStored: i24 (LE) = sector.latMin - header.latOffset | | |
| | | baseLonStored: i24 (LE) = sector.lonMin - header.lonOffset | | |
| | | flagsAndBits : u16 (LE) | | |
| | | bit0 = 1 (coordinates are bit-packed) | | |
| | | bit1 = 1:list mode | 0:bitmap mode | | |
| | | bits2..6 = bitsLat (5 bits) | | |
| | | bits7..15 = bitsLon (9 bits) | | |
| | +-----------------------------------------------------------------------+ | |
| +--------------------------------------------------------------------------------+ |
| |
| +--------------------------------------------------------------------------------+ |
| | UNIT DATA BLOBS (concatenated, one per sector, same order as sector table) | |
| | | |
| | Blob for SectorEntry s at: (OutwardBlockStart + unitsRelOff[s]) | |
| | +-----------------------------+ immediately followed by +-------------+ | |
| | | UNIT PRESENCE |---------------------------->| COORD STREAM | | |
| | +-----------------------------+ +-------------+ | |
| | | |
| | UNIT PRESENCE (choose ONE): | |
| | * Bitmap mode: EXACTLY 85 bytes (680 bits). Bit set => unitIndex present. | |
| | byte = unitIndex/8, bit = unitIndex%8. | |
| | * List mode: Varint sequence of unit indices using delta coding: | |
| | first value = absolute first unitIndex (varint) | |
| | subsequent values = delta to previous index (varint) | |
| | (Encoder picks list if total varint bytes < 85) | |
| | | |
| | COORD STREAM (bit-packed, no per-value padding): | |
| | For each of unitCount units (in ascending unitIndex order): | |
| | write latDelta : unsigned, width = bitsLat | |
| | write lonDelta : unsigned, width = bitsLon | |
| | Total length = ceil(unitCount x (bitsLat + bitsLon) / 8) bytes | |
| | | |
| | Reconstruct per unit: | |
| | latInt = header.latOffset + baseLatStored + latDelta | |
| | lonInt = header.lonOffset + baseLonStored + lonDelta | |
| | lat = latInt / 100000.0 | |
| | lon = lonInt / 100000.0 | |
| +--------------------------------------------------------------------------------+ |
| |
+======================================================================================+
| OUTWARD BLOCK i+1 ... (repeat) |
+======================================================================================+
LEGEND / NOTES
- All multi-byte integers are LITTLE-ENDIAN.
- Coordinates are quantized: lat/lon x 100000 -> integer (latInt/lonInt).
- Ordering:
- Outwards sorted lexicographically
- Sectors sorted by sectorNumber (0-9)
- Units sorted by unitIndex (ascending)
- Offsets:
- sectorIndexOffset is ABSOLUTE (file offset).
- unitsRelOff is RELATIVE to the start of its Outward Block (i.e., sector table start).
- Capacity:
- Bitmap presence uses 85 bytes = 680 bits (code implements 680 bits).
- bitsLat/bitsLon are minimal widths for sector-local deltas:
- latDelta = unit.latInt - sector.latMin
- lonDelta = unit.lonInt - sector.lonMin
Web Interface
๐ Live Demo: https://robertpitt.github.io/postcode-db/
A modern web interface is available in the playground/
directory. This provides a user-friendly way to search postcodes through a web browser.
Quick Start
# Install dependencies
yarn install
# Start development server
yarn dev
# Build for production
yarn build:web
Visit http://localhost:3000
to use the web interface.
Features
- Real-time search: Type partial or full postcodes for instant results
- Responsive design: Works on desktop and mobile
- Modern UI: Built with Tailwind CSS and Font Awesome icons
- Map integration: Direct links to Google Maps
- GitHub Pages ready: Automatic deployment workflow included
Deployment
The project includes a GitHub Actions workflow that automatically:
- Builds the postcode database
- Compiles the web interface
- Deploys to GitHub Pages
Simply push to the
main
branch to trigger deployment.