PEP 703 Explained: How Python Finally Removes the GIL

PEP 703 is one of the most transformative changes in CPython’s history. This proposal—now accepted—re-architects memory management, garbage collection, and container safety to safely enable true parallelism.

For years, Python has had a massive "elephant in the room": the Global Interpreter Lock (GIL). If you have ever written a Python script that tries to do two CPU-heavy tasks at once, you have likely run into it. You spawn two threads, expecting your code to run twice as fast, but instead, it runs at the same speed—or sometimes even slower.

The GIL allows only one thread to execute Python bytecode at a time, effectively turning your powerful multi-core CPU into a single-core machine. In this deep dive, we will explore exactly how the Python core team solved the "…

Why This Matters Now

Before looking at the "how," we must address the "why." For decades, single-threaded Python got faster automatically as CPUs improved (Moore’s Law). That era is over. Today, performance gains come from adding more cores, not faster ones.

At the same time, Python has become the standard for AI and Machine Learning—workloads that are inherently parallel. Sticking to a single-threaded runtime in a multi-core, AI-driven world is no longer sustainable. PEP 703 bridges this gap, allowing Python to finally utilize modern hardware to its full potential.

The Core Problem: Concurrency vs. Parallelism

Imagine a ticket counter with 10 open windows (your CPU cores), but there is only one employee (the GIL) who runs back and forth between the windows. Even if you have 10 customers (threads) ready to do business, only one gets served at a time. The others just wait.

Concurrency: The employee switches between windows rapidly. Progress happens on all tasks, but not at the exact same instant.
Parallelism: You hire 10 employees. All 10 windows serve customers at the exact same second.

Standard Python does concurrency well, but limits parallelism

PEP 703 removes the "one employee" limit, allowing every core on your machine to run Python code simultaneously. But doing this safely requires redesigning Python internals from the ground up.

How Python Fixed Reference Counting Without the GIL

Python manages memory automatically using Reference Counting. Every time you create a variable, Python attaches a counter to that object. In a free-threaded world, two threads updating the same counter simultaneously would cause race conditions.

Using atomic operations for every reference count update would be safe but unacceptably slow. PEP 703 solves this with three innovations:

1. Biased Reference Counting (BRC)

The developers realized that most objects are only ever used by the thread that created them.

BRC "biases" the reference count accordingly:

Owner thread: Fast non-atomic increments
Other threads: Slower atomic increments for safety

2. Immortal Objects

Objects like None, True, False, and small integers (0, 1) are accessed constantly. Locking them would create a hotspot of contention.

PEP 703 marks these objects as Immortal. Their reference counts never change—updates are simply ignored.

3. Deferred Reference Counting

Functions and modules are touched millions of times by many threads. Updating their refcounts constantly would drown performance.

PEP 703 defers these updates and allows the Garbage Collector to reconcile them later, avoiding countless atomic operations.

How Memory Allocation Works in the Free-Threaded Build

Standard Python uses pymalloc, which assumes the GIL is present. Without the GIL, it becomes unsafe for multithreaded use.

The free-threaded build introduces optional support for mimalloc, a high-performance allocator designed for parallel workloads.

mimalloc provides:

Thread-safe allocation without global locks
Paging of similar-sized objects, making GC scans much more efficient

If mimalloc is unavailable, CPython automatically falls back to a thread-safe variant of pymalloc, ensuring correct behavior even without the external allocator.

This significantly reduces contention when multiple threads request memory simultaneously.

How the New GC Avoids Race Conditions

Python’s Garbage Collector finds cyclic references. Under the GIL, the GC could run safely because no other thread could mutate structures mid-scan.

Without the GIL, Python adds new protection:

1. Stop-the-World Scanning

Pause all threads executing Python code
Scan safely
Resume execution ### 2. Removal of Generational GC

Standard Python scans young objects frequently. In a multi-threaded world, frequent pauses would crush performance. The free-threaded build adopts a non-generational GC, running less frequently and merging deferred reference count updates during each cycle.

Because the GC now runs less frequently in the free-threaded build, these stop-the-world pauses are shorter and happen significantly less often in practice.

How Lists and Dicts Stay Safe Across Threads

Each list and dictionary now has a very lightweight lock. But locking to read would destroy performance.

Python uses Optimistic Locking with version numbers instead.

A writer acquires the lock and increments the version number.
A reader:

Reads version
Reads data without locking
Re-reads version
If unchanged → success
If changed → retries with lock

Lists and dicts no longer depend on the GIL for thread safety — their lock-and-version design ensures safe concurrent access even when many threads operate on the same container.

Because reads far outnumber writes, this design keeps containers extremely fast.

The Reality Check: Performance & Trade-offs

Removing the GIL comes with overhead. Free-threaded Python must maintain per-object locks, deferred refcounting logic, and more.

Single-threaded programs: ~10–15% slower
Multi-threaded CPU-bound programs: scale almost linearly with core count

Python Build	CPU-Heavy Threads on 8 Cores
Standard CPython	~1× speed (no scaling)
Free-Threaded CPython	~6–8× speed depending on workload

For AI, scientific computing, and high-throughput web servers, this trade-off is overwhelmingly worthwhile.

Real-World Scenarios: Why Developers Should Care

ML Pipelines: Run data loading, preprocessing, and model evaluation in parallel without multiprocessing overhead.
Web Frameworks: Background CPU tasks (e.g., PDF generation, hashing) no longer block the main request thread.
Scientific Computing: Complex simulations can use multiple cores on a shared in-memory dataset instead of slow inter-process communication.

Compatibility & Migration

Will this break your code? Short answer: No, but you might now see race conditions that were previously hidden by the GIL.

Pure Python: Runs unchanged, but you must protect shared mutable state yourself when using threads.
C-extensions: Must be rebuilt with free-threading support. Many major libraries (NumPy, Pandas, PyTorch) already provide experimental builds.
Async: Unaffected—async is I/O-bound, not CPU-bound.
Ecosystem: Support is accelerating as the Python 3.14 timeline approaches.

Summary

PEP 703 represents years of engineering effort. Key takeaways:

True Parallelism: Python can now execute multiple threads at the same time on different CPU cores.
Redesigned Internals: Reference counting, memory allocation, container safety, and GC were upgraded for multi-threading.
Performance Reality: ~10% slower for purely single-threaded workloads, but massive multi-core acceleration for CPU-heavy tasks.
Status: Free-threaded Python is available in Python 3.13 (as an experimental build) and continues as an optional GIL-less build in Python 3.14.

Disclaimer: The views and opinions expressed in this blog are solely those of the author and do not represent the views of any organization or any individual with whom the author may be associated, professionally or personally.