Running PyTorch on a $6 Microcontroller
The Raspberry Pi Pico 2 W has 520KB of RAM. I got a PyTorch neural network running on it.
The Problem
PyTorch assumes you have resources. It’s a 500MB+ install. It needs Python. It allocates memory freely. That’s a lot of overhead for a microcontroller.
The Solution: ExecuTorch
ExecuTorch is Meta’s runtime for deploying PyTorch models on edge devices. The key insight: do all the hard work ahead of time.
On your laptop, you:
- Train your model normally in PyTorch
- Export it to a static graph
- Lower it for embedded (dtype specialization, out-variant operators)
- Run the memory planner
The memory planner analyzes tensor lifetimes (when each is created, when it’s last used) and assigns offsets into a fixed arena. Tens…
Running PyTorch on a $6 Microcontroller
The Raspberry Pi Pico 2 W has 520KB of RAM. I got a PyTorch neural network running on it.
The Problem
PyTorch assumes you have resources. It’s a 500MB+ install. It needs Python. It allocates memory freely. That’s a lot of overhead for a microcontroller.
The Solution: ExecuTorch
ExecuTorch is Meta’s runtime for deploying PyTorch models on edge devices. The key insight: do all the hard work ahead of time.
On your laptop, you:
- Train your model normally in PyTorch
- Export it to a static graph
- Lower it for embedded (dtype specialization, out-variant operators)
- Run the memory planner
The memory planner analyzes tensor lifetimes (when each is created, when it’s last used) and assigns offsets into a fixed arena. Tensors that don’t coexist share memory. By the time you flash the firmware, every memory access is predetermined. No allocation at inference time.
The output is a .pte file containing weights, the computation graph, and the memory map. About 3KB for this model.
The Model
A small network that predicts sin(x). Three linear layers, 16 hidden neurons, 337 parameters.
Input(1) → Linear(16) → ReLU → Linear(16) → ReLU → Linear(1) → Output
The Build
The .pte file gets converted to a C byte array and compiled alongside the ExecuTorch runtime (~50KB) for ARM Cortex-M33. Only the three kernels we actually use (linear, relu, add) are included.
Final firmware: ~150KB.
Results
Predictions match actual sin(x) with error under 0.01.