Problem setup (How Index Scan Internally works for both DBs)
Table in MySQL and PostgreSQL:
user(
id BIGINT PRIMARY KEY,
name VARCHAR(100),
age INT
)
Index:
CREATE INDEX idx_user_name ON user(name);
Query:
SELECT age FROM user WHERE name = 'Rahim';
Assume:
- Table has 10 million rows
- Index is a B-Tree
- name is not unique
- age is not in the index
1οΈβ£ MySQL (InnoDB) β Step by step
π Important InnoDB rule
Secondary index stores the PRIMARY KEY, not the row location
So idx_user_name looks like:
(name, primary_key_id)
Step-by-step execution
π§ Step 1: Traverse secondary index (name)
MySQL searches the B-Tree for βRahimβ.
Cost: O(log N)
Example:
B-Tree levels:
Root β Internal β Leaf
At the leaf, it finds: β¦
Problem setup (How Index Scan Internally works for both DBs)
Table in MySQL and PostgreSQL:
user(
id BIGINT PRIMARY KEY,
name VARCHAR(100),
age INT
)
Index:
CREATE INDEX idx_user_name ON user(name);
Query:
SELECT age FROM user WHERE name = 'Rahim';
Assume:
- Table has 10 million rows
- Index is a B-Tree
- name is not unique
- age is not in the index
1οΈβ£ MySQL (InnoDB) β Step by step
π Important InnoDB rule
Secondary index stores the PRIMARY KEY, not the row location
So idx_user_name looks like:
(name, primary_key_id)
Step-by-step execution
π§ Step 1: Traverse secondary index (name)
MySQL searches the B-Tree for βRahimβ.
Cost: O(log N)
Example:
B-Tree levels:
Root β Internal β Leaf
At the leaf, it finds:
('Rahim', id=73482)
β οΈ It does NOT have age yet
π§ Step 2: Traverse PRIMARY KEY index (clustered index)
In InnoDB:
Primary key index IS the table
Rows are physically stored in PK order
Now MySQL uses id = 73482 to search the PK B-Tree.
Cost: O(log N)
At the PK leaf node, it finds the full row:
(id=73482, name='Rahim', age=29)
β Total cost (MySQL)
π Key MySQL characteristics
- β Clustered index = good locality
- β Two tree traversals
- β PK lookup cost grows with table size
2οΈβ£ PostgreSQL β Step by step
π Important PostgreSQL rule
Index stores a TID (tuple ID) β pointer to heap location
Index entry looks like:
(name, (block_id, offset))
Step-by-step execution
π§ Step 1: Traverse B-Tree index (name)
PostgreSQL searches index for βRahimβ.
Cost: O(log N)
At the leaf, it finds:
('Rahim', TID=(block=102345, offset=7))
π§ Step 2: Heap fetch (direct pointer)
Postgres now:
Goes directly to heap page 102345
Reads row at offset 7
Cost: O(1) (conceptually)
Row found:
(id=73482, name='Rahim', age=29)
β Total cost (PostgreSQL)
π Key PostgreSQL characteristics
- β One B-Tree traversal
- β Heap fetch is constant-time
- β Heap pages may be scattered (less locality)
- β Extra visibility checks (MVCC)
3οΈβ£ Side-by-side comparison (core difference)
4οΈβ£ Real-world performance (THIS is the key insight)
β οΈ Big-O is not the full story
When MySQL can be faster
- PK is small
- Data fits in buffer pool
- Rows are accessed sequentially
- Heavy read workloads
β‘οΈ Clustered index wins
When PostgreSQL can be faster
- Very large tables
- Many secondary indexes
- Random access patterns
- Index-only scans possible
β‘οΈ Pointer-based access wins
5οΈβ£ Index-only scan (Postgres secret weapon π§ )
If you change index to:
CREATE INDEX idx_user_name_age ON user(name, age);
Then:
SELECT age FROM user WHERE name = 'Rahim';
π₯ PostgreSQL can return result WITHOUT touching heap at all
Cost: O(log N)
MySQL cannot do true index-only scan in the same way because: Secondary index does not contain row version visibility info
6οΈβ£ Final verdict (short answer)
π Theoretical efficiency
β PostgreSQL is more efficient
`O(logN) + O(1)
vs
O(logN) + O(logN)`
π Practical efficiency
`Small / medium datasets β MySQL often feels faster
Large datasets + many indexes β PostgreSQL scales better
Analytics / complex queries β PostgreSQL wins
Simple OLTP workloads β MySQL is excellent`
π§ One-line intuition
MySQL: βFind index β find PK β find rowβ Postgres: βFind index β jump straight to rowβ