Jan 19, 2026
This is the first post in a series about my ongoing journey to control a real RC car over the internet from a browser. With FPV video, a race countdown, an admin “referee” dashboard, and latency low enough that you don’t feel like you’re driving via postcards.
The project is called Tether Rally. Under the hood, it’s a mix of WebRTC, Raspberry Pi, ESP32, Cloudflare edge plumbing, and one slightly unhinged hardware constraint: ARRMA Big Rock’s 2-in-1 ESC/receiver.

Why this is harder than it sounds
If you pick a “normal” hobby RC car, you can usually inject control signals pretty cleanly:
- receiver outputs PWM
- servo + ESC take PWM
- Raspberry Pi can generate PWM (or you use a PCA9685)
- done
But ARRM…
Jan 19, 2026
This is the first post in a series about my ongoing journey to control a real RC car over the internet from a browser. With FPV video, a race countdown, an admin “referee” dashboard, and latency low enough that you don’t feel like you’re driving via postcards.
The project is called Tether Rally. Under the hood, it’s a mix of WebRTC, Raspberry Pi, ESP32, Cloudflare edge plumbing, and one slightly unhinged hardware constraint: ARRMA Big Rock’s 2-in-1 ESC/receiver.

Why this is harder than it sounds
If you pick a “normal” hobby RC car, you can usually inject control signals pretty cleanly:
- receiver outputs PWM
- servo + ESC take PWM
- Raspberry Pi can generate PWM (or you use a PCA9685)
- done
But ARRMA Big Rock (and friends) ship with an integrated receiver+ESC where the receiver is basically welded to the ESC from your perspective. No handy signal pins. No “just connect to channel 1/2”. So to control it digitally, I had two choices:
- replace the electronics stack (boring, expensive, loses the stock radio feel)
- control the transmitter instead
I went with option 2. Which leads to the core trick of this whole project:
The ESP32 sits on the transmitter and fakes the joystick voltages. The transmitter then sends a normal radio signal to the car, like nothing happened. That means the internet control link has to reach the transmitter, not just the car.
Architecture: two variants, one cursed and one clean
Clean version (most cars):
Browser ──WebRTC──> Pi (on car) ──PWM──> Servo/ESC
│
└──WebRTC Video──> Browser
ARRMA Big Rock version (the one I actually built):
Browser ──WebRTC──> Pi (on car) ──WiFi/UDP──> ESP32 (on transmitter) ──DAC──> Transmitter ~~Radio~~> Car
│
└──WebRTC Video──> Browser
The Pi is on the car because it owns the camera. The ESP32 is on the transmitter because the transmitter is the only controllable “input device” the car accepts.
This split also creates a fun constraint: the Pi and ESP32 must share a LAN. In my setup they’re typically both on an iPhone hotspot or a portable 4G router.

The two hard problems: latency and safety
Latency
At first I tried the classic “easy” approach: browser → Cloudflare Worker → Pi/ESP via WebSockets.
It worked, but it had that unmistakable feel of driving a car through an interpreter. Steering corrections arrive late, you overcorrect, then you overcorrect the overcorrection, and suddenly you’ve invented drift racing against your will.
So I switched the control path to WebRTC DataChannel:
- unordered packets
- no retransmits
- basically “UDP with better NAT traversal and fewer tears”
That dropped the control RTT to something like 10–15ms on LAN and ~100–200ms over the internet (which is in the “playable” range, especially if your throttle limit isn’t set to “lawnmower”).
Safety
If you’re going to let random people drive a real car remotely, you need the system to fail boring.
So the safety rules live where they can’t be bypassed:
- throttle limits enforced on the ESP32 (not the browser)
- auto-neutral on connection loss
- slew-rate limiting so the car can’t instantly snap from 0 to full send
- race state machine blocks input until it’s actually time to race
And access is token-based, more on that later.
Video: “it has to feel live” (and also not melt the Pi)
For FPV I ended up with:
- Raspberry Pi Zero 2W
- Camera Module 3 (Wide)
- MediaMTX serving WebRTC via WHEP
- Cloudflare Tunnel to expose the endpoints
- Cloudflare TURN for NAT traversal
MediaMTX is doing a lot of heavy lifting here: it can grab frames from the Pi camera stack, hardware-encode H.264, and serve WebRTC without me writing a custom signaling/video pipeline.
The result is usually ~100–300ms video latency depending on network and whether you get a direct P2P route or a relayed TURN path.
That’s still higher than control latency, which is exactly what you want. Control should be snappy; video can be “good enough” as long as it’s stable and you don’t get multi-second buffering.

Controls: DataChannel → Pi → UDP → ESP32 → DAC voltages
Controls are sent from the browser at 50Hz over a binary protocol. On the Pi, a Python relay receives DataChannel packets and forwards them via UDP to the ESP32 on the LAN.
Binary protocol (the boring part that makes everything less boring)
Packet format: seq(uint16 LE) + cmd(uint8) + payload
Why UDP between Pi and ESP32? Because it’s LAN-local, tiny packets, and the ESP32 already has enough going on. UDP is perfect here, and “reliability” is handled by high frequency updates + timeouts instead of retransmits.
Control link evolution: from “it works” to “it feels right”
The first version of Tether Rally’s control path was the classic “ship it” architecture: Browser → Cloudflare Worker → WebSocket → (something on the other end) → car.
It was easy to deploy, easy to reason about, and it worked on basically any network without thinking too hard about NAT traversal. But it had one fatal flaw: It felt like driving while someone narrates your steering inputs to a second person, who then turns the wheel for you.
Not always. Not consistently. But often enough that you start driving the latency instead of driving the car.
Why WebSockets felt bad (even when the RTT looked “okay”)
WebSockets are TCP. That sentence is the whole story, but it’s worth unpacking because the failure mode isn’t obvious until you try to do realtime control.
What I cared about for controls wasn’t throughput or reliability. It was:
- freshness (the latest input is the only one that matters)
- predictable timing (jitter feels worse than raw latency)
- graceful loss (missing one packet should be invisible, not catastrophic)
TCP gives you almost the opposite:
- reliability: if a packet is lost, it will be retransmitted
- ordering: packets must be delivered in order
- head-of-line blocking: one missing packet can stall everything behind it
This is great for loading web pages. It’s a small disaster for “steer left, no wait steer right, no wait...”.
The control loop is sending updates at 50Hz. That’s one packet every 20ms. If you drop one packet on a shaky link, TCP doesn’t just shrug and move on, it tries to heal the timeline. That can mean the “old” steering update arrives late or it arrives before the newer one (because ordering) or everything queues behind it and you get a clump of stale inputs delivered together.
So the car does the exact thing you never want: it briefly becomes more correct about the past instead of more correct about the present.
And from the driver’s perspective, that shows up as weirdness: steering that “sticks” for a beat, sudden catch-up wiggles, throttle that feels mushy and then jumps.
Even if average RTT is fine, this kind of jitter is poison for control feel.
The aha moment: controls want UDP semantics
For control packets if one is lost, you don’t want it back, you want the next one. So what I actually wanted was: NAT traversal, a datagram-ish channel (unreliable, unordered), easy browser support. Which is basically: WebRTC DataChannel configured like UDP.
Switching to WebRTC DataChannel
Once I moved the browser→Pi link to WebRTC, two things happened immediately: RTT dropped dramatically on good paths (especially LAN), but more importantly jitter stopped feeling like betrayal.
ESP32 firmware: the part that stopped the car from twitching like it’s haunted
Analog joystick emulation sounds simple until you discover all the ways noise and scheduling ruin your day. The big fix was making the ESP32 behave like a real control loop, not “a WiFi packet handler that sometimes writes a DAC”.
So the firmware runs two tasks:
-
Core 0: UDP receive (updates target throttle/steering)
-
Core 1: control loop at 200Hz
-
EMA smoothing (α ≈ 0.25)
-
slew rate limiting (~8/sec max change)
-
staged timeout: 80ms hold → 250ms neutral
That’s what killed the stutter. Packet timing can jitter; the output loop stays smooth.
The admin dashboard
If you’ve ever tried to run anything resembling a tournament, you quickly learn that “just let them drive” turns into chaos. So the admin dashboard is basically a tiny race director:
- see if ESP32 is discovered on the network
- if player is connected
- admin hits Start/Stop Race, kicks the player etc
Having the player to press “Ready” button also solved a subtle UX problem: video connect time varies a lot. Blocking controls until video is confirmed avoids the “I’m driving blind while WebRTC negotiates” moment.

YouTube restreaming: turning the whole thing into a spectator sport
If you want to, you can restream the camera feed to YouTube. The restreamer is a separate service (Fly.io) that consumes the WHEP stream, re-encodes to RTMP and pushes it to YouTube. It’s not necessary for racing, but it’s perfect for “tournament vibes”.
The plan: tournaments and real timing
The next milestones are what turn this from a tech demo into a racing platform:
- Tournament queue system: A simple queue where people join a tournament slot, you get your turn and spectators can watch.
- Timing system: Manual timing is fine for testing, but real racing needs automated timing with reliable ordering. I’m currently leaning toward BLE beacons for start/finish (and eventually checkpoints), because they’re cheap and easy to setup.