Meta Description: Discover why your PostgreSQL string type choices matter through a wardrobe analogy. Learn CHAR, VARCHAR, and TEXT differences with real performance data and migration horror stories.
Cover Image Concept: Split image showing a messy closet overflowing with clothes labeled "VARCHAR(255)" on one side, and a perfectly organized modular wardrobe system labeled "TEXT/VARCHAR/CHAR" on the other.
The $14,000 Migration
"Just use VARCHAR(255) for everything. Itβs a safe default."
That advice seemed reasonable when I started building our user profile system. Names? VARCHAR(255). Bios? VARCHAR(255). Even zip codesβyou guessed itβVARCHAR(255).
Fast forward 18 months: 12 million user records, and I needed to let users write longer bios. Thβ¦
Meta Description: Discover why your PostgreSQL string type choices matter through a wardrobe analogy. Learn CHAR, VARCHAR, and TEXT differences with real performance data and migration horror stories.
Cover Image Concept: Split image showing a messy closet overflowing with clothes labeled "VARCHAR(255)" on one side, and a perfectly organized modular wardrobe system labeled "TEXT/VARCHAR/CHAR" on the other.
The $14,000 Migration
"Just use VARCHAR(255) for everything. Itβs a safe default."
That advice seemed reasonable when I started building our user profile system. Names? VARCHAR(255). Bios? VARCHAR(255). Even zip codesβyou guessed itβVARCHAR(255).
Fast forward 18 months: 12 million user records, and I needed to let users write longer bios. The migration took 47 hours of downtime and cost us $14,000 in lost revenue.
All because I didnβt understand how PostgreSQL actually handles strings.
The brutal truth? In PostgreSQL, VARCHAR(255) isnβt the universal answer. In fact, itβs often the wrong answer. Let me show you why, using a metaphor that finally made it click for me: your wardrobe.
Your Database is a Wardrobe (Stay With Me)
Think about how you organize clothes:
- Fixed-size drawers (dress shirts, folded t-shirts) =
CHAR(n) - Expandable shelves (sweaters that vary in bulk) =
VARCHAR(n) - The closet rod (coats of wildly different lengths) =
TEXTorVARCHAR(no limit)
Each storage method has trade-offs. Use a drawer for a winter coat? Waste of space. Hang a tie on the closet rod? Works, but inefficient.
PostgreSQL string types work the same way. Let me prove it.
The Three Closets: CHAR, VARCHAR, and TEXT
CHAR(n): The Rigid Drawer
CREATE TABLE user_codes (
user_id BIGSERIAL PRIMARY KEY,
country_code CHAR(2), -- Always exactly 2 characters
postal_code CHAR(5), -- US zip codes, always 5 digits
access_level CHAR(1) -- 'A', 'B', 'C' only
);
-- What happens under the hood?
INSERT INTO user_codes (country_code, postal_code, access_level)
VALUES ('US', '90210', 'A');
-- PostgreSQL pads with spaces!
SELECT
country_code,
LENGTH(country_code) as len,
'|' || country_code || '|' as visual
FROM user_codes;
/*
country_code | len | visual
--------------+-----+--------
US | 2 | |US|
*/
-- But watch this gotcha:
INSERT INTO user_codes (country_code) VALUES ('U');
-- Stored as 'U ' (with trailing space!)
When to use CHAR:
- Fixed-length data (country codes, US state abbreviations, gender codes)
- Small, known sizes (MD5 hashes are always 32 chars)
- Performance-critical lookups where every byte counts
The wardrobe rule: Use drawers only when every item is the exact same size.
VARCHAR(n): The Adjustable Shelf
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
username VARCHAR(30), -- Twitter-style limit
email VARCHAR(254), -- Max valid email length (RFC 5321)
display_name VARCHAR(100)
);
-- No padding, only stores actual characters
INSERT INTO users (username, email, display_name)
VALUES ('alice', 'alice@example.com', 'Alice Anderson');
SELECT
username,
LENGTH(username) as len,
pg_column_size(username) as bytes
FROM users;
/*
username | len | bytes
----------+-----+-------
alice | 5 | 10 (5 chars + 5 bytes overhead)
*/
-- The dangerous assumption:
ALTER TABLE users ALTER COLUMN username TYPE VARCHAR(50);
-- On 10M rows: This can take HOURS and lock your table!
Hereβs the plot twist that blew my mind:
-- Test: Does VARCHAR length limit affect performance?
CREATE TABLE test_varchar (
short_limit VARCHAR(10),
medium_limit VARCHAR(255),
long_limit VARCHAR(10000)
);
-- Insert same data into all columns
INSERT INTO test_varchar
SELECT 'hello', 'hello', 'hello'
FROM generate_series(1, 1000000);
-- Check storage size
SELECT
pg_column_size(short_limit) as short_bytes,
pg_column_size(medium_limit) as medium_bytes,
pg_column_size(long_limit) as long_bytes
FROM test_varchar LIMIT 1;
/*
short_bytes | medium_bytes | long_bytes
-------------+--------------+------------
10 | 10 | 10
*/
MIND. BLOWN.
VARCHAR(10) and VARCHAR(10000) take the same storage space for the same data. The limit is just a constraint, not a storage directive!
TEXT: The Unlimited Closet Rod
CREATE TABLE posts (
id BIGSERIAL PRIMARY KEY,
content TEXT, -- No arbitrary limit
author_bio TEXT,
comments TEXT[] -- Array of text (PostgreSQL magic!)
);
-- TEXT and VARCHAR (no limit) are identical in PostgreSQL
-- Seriously. Check the source code. Same type internally.
-- This is valid:
INSERT INTO posts (content)
VALUES (repeat('a', 1000000)); -- 1 million characters!
-- Storage is the same as VARCHAR
SELECT pg_column_size(content) FROM posts;
The shocking truth: In PostgreSQL, TEXT and VARCHAR (without length) are literally the same type. The only difference is that VARCHAR lets you add a length constraint.
-- These are functionally identical:
name TEXT
name VARCHAR
-- This adds a constraint:
name VARCHAR(50)
-- Which is equivalent to:
name TEXT CHECK (LENGTH(name) <= 50)
The Performance Mythology: Debunked
Myth #1: "VARCHAR(255) is faster than TEXT"
-- Performance test
CREATE TABLE test_performance (
id SERIAL,
text_col TEXT,
varchar_col VARCHAR(255),
varchar_unlimited VARCHAR
);
-- Insert 1 million rows with random string lengths
INSERT INTO test_performance
SELECT
i,
repeat('x', (random() * 200)::int),
repeat('x', (random() * 200)::int),
repeat('x', (random() * 200)::int)
FROM generate_series(1, 1000000) i;
-- Create identical indexes
CREATE INDEX idx_text ON test_performance(text_col);
CREATE INDEX idx_varchar ON test_performance(varchar_col);
CREATE INDEX idx_varchar_unl ON test_performance(varchar_unlimited);
-- Query performance test
EXPLAIN ANALYZE
SELECT * FROM test_performance WHERE text_col = 'test';
EXPLAIN ANALYZE
SELECT * FROM test_performance WHERE varchar_col = 'test';
/*
Results: Nearly IDENTICAL performance!
The difference is in nanoseconds, not milliseconds.
*/
Myth #2: "Always set a length limit to save space"
Nope. PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) for long strings automatically.
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β ROW STORAGE (How PostgreSQL Actually Works) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Short strings (< ~2KB): β
β ββββββββββββββββββββββββββββββββββββ β
β β Stored inline with row β β
β β [id][username][email][bio] β β
β ββββββββββββββββββββββββββββββββββββ β
β β
β Long strings (> ~2KB): β
β ββββββββββββββββββββ βββββββββββββββββββ β
β β [id][username] ββββββ>β TOAST table β β
β β [email][pointer] β β [actual bio] β β
β ββββββββββββββββββββ βββββββββββββββββββ β
β β
β VARCHAR(255) vs TEXT: Same storage strategy! β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
The Real-World Decision Tree
Hereβs how I choose string types now:
-- 1. FIXED LENGTH DATA β CHAR
CREATE TABLE standard_codes (
iso_country CHAR(2), -- Always 2: 'US', 'UK', 'FR'
iso_language CHAR(2), -- Always 2: 'en', 'es', 'fr'
state_code CHAR(2), -- Always 2: 'CA', 'NY', 'TX'
md5_hash CHAR(32), -- Always 32: MD5 output
uuid CHAR(36) -- Always 36: with hyphens
);
-- 2. CONSTRAINED LENGTH (Business Rule) β VARCHAR(n)
CREATE TABLE user_profiles (
username VARCHAR(30), -- Twitter: max 15, we allow 30
email VARCHAR(254), -- RFC 5321 max email length
twitter_handle VARCHAR(16), -- Twitter's actual limit
phone VARCHAR(20), -- E.164 format + extras
slug VARCHAR(100) -- URL-safe identifier
);
-- 3. UNBOUNDED BUT SEARCHABLE β TEXT with indexes
CREATE TABLE content (
title TEXT, -- Could be long, need to search
body TEXT, -- Articles, comments, etc.
notes TEXT,
-- Make searches fast
CONSTRAINT title_reasonable CHECK (LENGTH(title) <= 500)
);
CREATE INDEX idx_content_title ON content USING GIN (to_tsvector('english', title));
-- 4. TRULY UNLIMITED β TEXT
CREATE TABLE documents (
content TEXT, -- Could be books
markdown TEXT, -- Code blocks, documentation
json_data TEXT -- Before PostgreSQL 9.2 (use JSONB now!)
);
The Migration Horror Story (And How to Avoid It)
Remember that $14,000 mistake? Hereβs what happened:
-- Original (naive) design
CREATE TABLE user_profiles (
id BIGSERIAL PRIMARY KEY,
bio VARCHAR(255)
);
-- 18 months later, 12 million rows
SELECT COUNT(*) FROM user_profiles; -- 12,458,392
-- The destructive migration
ALTER TABLE user_profiles
ALTER COLUMN bio TYPE VARCHAR(1000);
/*
PostgreSQL had to:
1. Lock the ENTIRE table (no reads/writes)
2. Rewrite EVERY row (even unchanged ones)
3. Rebuild ALL indexes referencing this column
4. Update ALL foreign key constraints
Time: 47 hours
Downtime: Unacceptable
Cost: $14K in lost revenue
*/
The smart way:
-- If I'd used TEXT from the start:
CREATE TABLE user_profiles (
id BIGSERIAL PRIMARY KEY,
bio TEXT,
CONSTRAINT bio_reasonable CHECK (LENGTH(bio) <= 255)
);
-- To extend later (NO table rewrite!):
ALTER TABLE user_profiles DROP CONSTRAINT bio_reasonable;
ALTER TABLE user_profiles ADD CONSTRAINT bio_reasonable CHECK (LENGTH(bio) <= 1000);
-- Time: < 1 second
-- Locks: Minimal
-- Cost: $0
The golden rule: Start with TEXT + CHECK constraint, not VARCHAR(n).
Edge Cases That Will Bite You
1. The Emoji Explosion
-- User stores emoji in username
INSERT INTO users (username) VALUES ('Alice π©βπ»');
-- How long is it?
SELECT LENGTH('Alice π©βπ»'); -- 8 characters
SELECT pg_column_size('Alice π©βπ»'); -- 15 bytes!
-- With VARCHAR(10), this fits
-- But byte-wise, it's bigger than you think
-- The safer approach:
ALTER TABLE users ADD CONSTRAINT username_bytes
CHECK (pg_column_size(username) <= 50);
2. The Collation Trap
-- Case-insensitive searches
CREATE TABLE tags (
name VARCHAR(50) COLLATE "en_US" -- Case-sensitive by default
);
INSERT INTO tags VALUES ('PostgreSQL'), ('postgresql'), ('POSTGRESQL');
-- Returns 3 rows (all different!)
SELECT * FROM tags WHERE name = 'postgresql';
-- Better approach:
CREATE TABLE tags (
name CITEXT -- Case-insensitive text type
);
-- Or use functional index:
CREATE INDEX idx_tags_lower ON tags (LOWER(name));
SELECT * FROM tags WHERE LOWER(name) = LOWER('postgresql');
3. The NULL vs Empty String Confusion
-- These are DIFFERENT in PostgreSQL (unlike MySQL)
SELECT '' = NULL; -- NULL (not true!)
SELECT '' IS NULL; -- false
SELECT NULL IS NULL; -- true
-- Implications for constraints:
CREATE TABLE profiles (
bio TEXT NOT NULL DEFAULT '' -- Empty string allowed
);
INSERT INTO profiles (bio) VALUES (''); -- OK
INSERT INTO profiles (bio) VALUES (NULL); -- ERROR!
-- Better: disallow both
CREATE TABLE profiles (
bio TEXT,
CONSTRAINT bio_not_empty CHECK (bio IS NOT NULL AND LENGTH(TRIM(bio)) > 0)
);
The Performance Optimization Playbook
1. Prefix Indexes for Long Strings
-- Don't index the entire TEXT column
CREATE INDEX idx_url_full ON pages(url); -- Could be huge!
-- Index only the first N characters
CREATE INDEX idx_url_prefix ON pages((LEFT(url, 100)));
-- Use in queries:
SELECT * FROM pages WHERE LEFT(url, 100) = 'https://example.com/very/long/path...';
2. Full-Text Search Instead of LIKE
-- Slow: Sequential scan
SELECT * FROM articles WHERE content LIKE '%postgresql%';
-- Fast: GIN index with full-text search
CREATE INDEX idx_articles_fts ON articles
USING GIN (to_tsvector('english', content));
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql');
-- With ranking!
SELECT
title,
ts_rank(to_tsvector('english', content), query) AS rank
FROM articles, to_tsquery('postgresql & performance') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;
3. Compression for Archival Data
-- PostgreSQL 14+ supports LZ4 compression
CREATE TABLE archives (
id BIGSERIAL PRIMARY KEY,
document TEXT COMPRESSION lz4
);
-- Old approach (manual):
CREATE TABLE archives (
id BIGSERIAL PRIMARY KEY,
document_compressed BYTEA -- Store compressed externally
);
-- Compress in application layer
INSERT INTO archives (document_compressed)
VALUES (compress_function('very long text...'));
The Ultimate Cheat Sheet
/*
ββββββββββββββββββ¬βββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β Use Case β Type β Example β
ββββββββββββββββββΌβββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β Country code β CHAR(2) β 'US', 'UK', 'FR' β
β State code β CHAR(2) β 'CA', 'NY', 'TX' β
β MD5 hash β CHAR(32) β 'd41d8cd98f00b204e9800998ecf8427e'β
β UUID β UUID β Use UUID type, not CHAR! β
β β β β
β Username β VARCHAR(30) β Business constraint β
β Email β VARCHAR(254) β RFC max length β
β Slug β VARCHAR(100) β URL identifier β
β Phone β VARCHAR(20) β E.164 + formatting β
β β β β
β Name β TEXT β No arbitrary limit β
β Bio β TEXT β + CHECK constraint β
β Comment β TEXT β Unknown length β
β Article body β TEXT β + full-text index β
β Description β TEXT β Flexibility β
β β β β
β Tag β CITEXT β Case-insensitive β
β Enum-like β ENUM β Or CHECK (status IN (...)) β
ββββββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ
*/
-- My default template:
CREATE TABLE your_table (
id BIGSERIAL PRIMARY KEY,
-- Fixed-length codes
country_code CHAR(2),
-- Constrained business fields
username VARCHAR(30) NOT NULL UNIQUE,
email VARCHAR(254) NOT NULL UNIQUE,
-- Flexible text fields
display_name TEXT NOT NULL,
bio TEXT,
-- Add reasonable constraints
CONSTRAINT bio_length CHECK (LENGTH(bio) <= 5000),
CONSTRAINT display_name_not_empty CHECK (LENGTH(TRIM(display_name)) > 0)
);
The Lessons I Learned (The Hard Way)
- TEXT is not "lazy"βitβs often the smartest choice
- VARCHAR(255) is cargo cult programming from MySQL days
- Storage space is rarely your bottleneckβquery performance is
- Constraints protect youβuse CHECK liberally
- Migration pain is realβdesign for change from day one
The New Rules I Follow
-- β
DO: Start flexible, add constraints
CREATE TABLE users (
bio TEXT,
CONSTRAINT bio_length CHECK (LENGTH(bio) <= 500)
);
-- β DON'T: Start rigid, migrate painfully
CREATE TABLE users (
bio VARCHAR(255) -- Will require table rewrite to extend
);
-- β
DO: Use domain types for reusability
CREATE DOMAIN email AS TEXT
CHECK (VALUE ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');
CREATE TABLE users (
email email NOT NULL UNIQUE
);
-- β
DO: Index smartly
CREATE INDEX idx_user_search ON users
USING GIN (to_tsvector('english', bio));
-- β DON'T: Index entire large text columns
CREATE INDEX idx_bio ON users(bio); -- Probably too big
Your Turn: The Challenge
Look at your current database schema. I bet you have VARCHAR(255) fields that should be TEXT. Hereβs a safe migration strategy:
-- 1. Add the new column
ALTER TABLE your_table ADD COLUMN bio_new TEXT;
-- 2. Copy data (in batches for large tables)
UPDATE your_table SET bio_new = bio WHERE id BETWEEN 1 AND 100000;
UPDATE your_table SET bio_new = bio WHERE id BETWEEN 100001 AND 200000;
-- ... continue in batches
-- 3. Add constraint
ALTER TABLE your_table
ADD CONSTRAINT bio_new_length CHECK (LENGTH(bio_new) <= 1000);
-- 4. Swap in a transaction
BEGIN;
ALTER TABLE your_table DROP COLUMN bio;
ALTER TABLE your_table RENAME COLUMN bio_new TO bio;
COMMIT;
-- 5. Celebrate! π
The Bottom Line
PostgreSQL string types arenβt about memorizing syntaxβtheyβre about understanding intent:
- CHAR: "This will ALWAYS be exactly N characters"
- VARCHAR(n): "This MUST NOT exceed N characters" (business rule)
- TEXT: "This can be any length, and I trust PostgreSQL to handle it"
Choose the one that matches your actual constraint, not your assumptions about performance.
And for the love of databases, stop defaulting to VARCHAR(255).
Further Reading
- PostgreSQL String Type Documentation
- TOAST: The Oversized-Attribute Storage Technique
- Full-Text Search in PostgreSQL
Whatβs your VARCHAR(255) horror story? Drop it in the commentsβI want to hear about your $14,000 migration moment! πΈ