PyTorch on a $6 computer

Running PyTorch on a $6 Microcontroller

The Raspberry Pi Pico 2 W has 520KB of RAM. I got a PyTorch neural network running on it.

The Problem

PyTorch assumes you have resources. It’s a 500MB+ install. It needs Python. It allocates memory freely. That’s a lot of overhead for a microcontroller.

The Solution: ExecuTorch

ExecuTorch is Meta’s runtime for deploying PyTorch models on edge devices. The key insight: do all the hard work ahead of time.

On your laptop, you:

Train your model normally in PyTorch
Export it to a static graph
Lower it for embedded (dtype specialization, out-variant operators)
Run the memory planner

The memory planner analyzes tensor lifetimes (when each is created, when it’s last used) and assigns offsets into a fixed arena. Tens…

Running PyTorch on a $6 Microcontroller

The Raspberry Pi Pico 2 W has 520KB of RAM. I got a PyTorch neural network running on it.

The Problem

PyTorch assumes you have resources. It’s a 500MB+ install. It needs Python. It allocates memory freely. That’s a lot of overhead for a microcontroller.

The Solution: ExecuTorch

ExecuTorch is Meta’s runtime for deploying PyTorch models on edge devices. The key insight: do all the hard work ahead of time.

On your laptop, you:

Train your model normally in PyTorch
Export it to a static graph
Lower it for embedded (dtype specialization, out-variant operators)
Run the memory planner

The memory planner analyzes tensor lifetimes (when each is created, when it’s last used) and assigns offsets into a fixed arena. Tensors that don’t coexist share memory. By the time you flash the firmware, every memory access is predetermined. No allocation at inference time.

The output is a .pte file containing weights, the computation graph, and the memory map. About 3KB for this model.

The Model

A small network that predicts sin(x). Three linear layers, 16 hidden neurons, 337 parameters.

Input(1) → Linear(16) → ReLU → Linear(16) → ReLU → Linear(1) → Output

The Build

The .pte file gets converted to a C byte array and compiled alongside the ExecuTorch runtime (~50KB) for ARM Cortex-M33. Only the three kernels we actually use (linear, relu, add) are included.

Final firmware: ~150KB.

Results

Predictions match actual sin(x) with error under 0.01.

In-depth Video

Code

https://github.com/spartacoos/executorch-sine-pico2w

Running PyTorch on a $6 Microcontroller

The Problem

The Solution: ExecuTorch

Running PyTorch on a $6 Microcontroller

The Problem

The Solution: ExecuTorch

The Model

The Build

Results

In-depth Video

Code

Similar Posts